Re: What's the deal with dataimporthandler overwriting indexes?

2019-02-12 Thread Elizabeth Haubert
I've run into this also; it is a key difference between a master-slave
setup and a solrCloud setup.

clean=true has always deleted the index on the first commit, but in older
versions of Solr, the workaround was to disable replication until the full
reindex had completed.

This is a convenient practice for a number of reasons, especially for small
indices.  It really isn't supported in SolrCloud, because of the difference
in how writes are processed for Master/Slave vs. SolrCloud.  With a
Master/Slave setup, all writes are going to the same location, so disabling
replication lets you buffer them up all in one go.   With a SolrCloud
setup,  the data is distributed across the nodes in the cluster.  So it
would need to know to blow away at the 'master' node for each shard to
support the 'clean', serve traffic from the slaves only for each shard,
until the re-index completes, do the replications, and then resume normal
operation.

Note that in Solr 7.x if you revert to the master/slave setup, you need to
disable polling at the slaves.  Disabling replication at the master will
also cause index deletion at the slaves (SOLR-11938).

Elizabeth

On Tue, Feb 12, 2019 at 11:42 AM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

> Hi!
> If clean=true then index will be replaced completely by the new import.
> That is how it is supposed to work.
> If you don't want preemptively delete your index set =false. And set
> =true instead of =true
> Are you sure about optimize? Do you really need it? Usually it's very
> costly.
> So, I'd try:
> dataimport?command=full-import=false=true
>
> If nevertheless nothing imported, please check the log
> --
> Vadim
>
>
>
> > -Original Message-
> > From: Joakim Hansson [mailto:joakim.hansso...@gmail.com]
> > Sent: Tuesday, February 12, 2019 12:47 PM
> > To: solr-user@lucene.apache.org
> > Subject: What's the deal with dataimporthandler overwriting indexes?
> >
> > Hi!
> > We are currently upgrading from solr 6.2 master slave setup to solr 7.6
> > running solrcloud.
> > I dont know if I've missed something really trivial, but everytime I
> start
> > a full import (dataimport?command=full-import=true=true)
> > the
> > old index gets overwritten by the new import.
> >
> > In 6.2 this wasn't really a problem since I could disable replication in
> > the API on the master and enable it once the import was completed.
> > With 7.6 and solrcloud we use NRT-shards and replicas since those are the
> > only ones that support rule-based replica placement and whenever I start
> a
> > new import the old index is overwritten all over the solrcloud cluster.
> >
> > I have tried changing to clean=false, but that makes the import finish
> > without adding any docs.
> > Doesn't matter if I use soft or hard commits.
> >
> > I don't get the logic in this. Why would you ever want to delete an
> > existing index before there is a new one in place? What is it I'm missing
> > here?
> >
> > Please enlighten me.
>
>


Re: Per-field slop param in eDisMax

2019-01-24 Thread Elizabeth Haubert
To do this you specify the slop on each field when you specify the
pf/pf2/pf3 parameters:
pf:fieldA~2 fieldB~5

I'll try to add an example to the documentation here:
https://lucene.apache.org/solr/guide/7_6/the-extended-dismax-query-parser.html#using-slop

Elizabeth

On Wed, Jan 23, 2019 at 10:30 PM Yasufumi Mizoguchi 
wrote:

> Hi,
>
> I am struggling to set per-field slop param in eDisMax query parser with
> Solr 6.0 and 7.6.
> What I want to do with eDixMax is similar to following in the default query
> parser.
>
> * Query string : "aaa bbb"
> * Target fields : fieldA(TextField), fieldB(TextField)
>
> q=fieldA:"aaa bbb"~2 OR fieldB:"aaa bbb"~5
>
> Anyone have good ideas?
>
> Thanks,
> Yasufumi.
>


Re: Search query with & without question mark

2019-01-14 Thread Elizabeth Haubert
Because the standard query parser treats '?' as a single-character wildcard:
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html

So in the case q="how do I add a field", the word "field" in your document
matches.  In the second case q="how do I add a field?" it is looking for
tokens like "fields" or "fielde";   The term without a trailing 1-character
suffix doesn't match anymore.   That is why it is no longer included in the
scoring.

https://lucene.apache.org/solr/guide/7_6/the-standard-query-parser.html#wildcard-searches

Elizabeth


On Mon, Jan 14, 2019 at 2:07 AM Jay Potharaju  wrote:

> the parsedquery is same when debugging, but when calculating the scores
> different fields are being taken into consideration. Why would that be the
> case? My guess is that the suggeststopfilterfactory is not working as i
> expect it to and causing this weird situation.
>
> Updated field type definition:
>   <
> charFilter class="solr.PatternReplaceCharFilterFactory" pattern=
> "['!#\$%'\(\)\*+,-\./:;=?@\[\]\^_`{|}~!@#$%^*]" />  "solr.StandardTokenizerFactory"/>  "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> "lang/stopwords_en.txt" />  <
> filter class="solr.EnglishPossessiveFilterFactory"/>  "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
>  fieldType>
>
> Debug Query:
> *"rawquerystring":"how do i add a field"*,
> "querystring":"how do i add a field",
> "parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
> DisjunctionMaxQuery((topic_title_plain:do))
> DisjunctionMaxQuery((topic_title_plain:i))
> DisjunctionMaxQuery((topic_title_plain:add))
> DisjunctionMaxQuery((topic_title_plain:a))
> DisjunctionMaxQuery((topic_title_plain:field/no_coord",
> "parsedquery_toString":"+((topic_title_plain:how)
> (topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
> (topic_title_plain:a) (topic_title_plain:field))",
> "explain":{
>   "1":"
> 6.1034017 = sum of:
>   2.0065408 = weight(topic_title_plain:add in 107) [SchemaSimilarity],
> result of:
> 2.0065408 = score(doc=107,freq=1.0 = termFreq=1.0
> ), product of:
>   2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
> 32.0 = docFreq
> 275.0 = docCount
>   0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 3.4436364 = avgFieldLength
> 4.0 = fieldLength
>   4.096861 = weight(topic_title_plain:field in 107) [SchemaSimilarity],
> result of:
> 4.096861 = score(doc=107,freq=1.0 = termFreq=1.0
> ), product of:
>   4.367638 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
> 3.0 = docFreq
> 275.0 = docCount
>   0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 3.4436364 = avgFieldLength
> 4.0 = fieldLength
> "},
>
> *rawquerystring":"how do i add a field?",*
> "querystring":"how do i add a field?",
> "parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
> DisjunctionMaxQuery((topic_title_plain:do))
> DisjunctionMaxQuery((topic_title_plain:i))
> DisjunctionMaxQuery((topic_title_plain:add))
> DisjunctionMaxQuery((topic_title_plain:a))
> DisjunctionMaxQuery((topic_title_plain:field/no_coord",
> "parsedquery_toString":"+((topic_title_plain:how)
> (topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
> (topic_title_plain:a) (topic_title_plain:field))",
> "explain":{
>   "2":"
> 3.798876 = sum of:
>   2.033249 = weight(topic_title_plain:how in 202) [SchemaSimilarity],
> result of:
> 2.033249 = score(doc=202,freq=1.0 = termFreq=1.0
> ), product of:
>   2.4634004 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
> 23.0 = docFreq
> 275.0 = docCount
>   0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
> - b + b * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 3.4436364 = avgFieldLength
> 5.2244897 = fieldLength
> *  1.7656271 = weight(topic_title_plain:add in 202) [SchemaSimilarity],
> result of:*
> 1.7656271 = score(doc=202,freq=1.0 = termFreq=1.0
> ), product of:
>   2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
> 32.0 = docFreq
> 275.0 = docCount
>   0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
> - b + b * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 3.4436364 = avgFieldLength
> 5.2244897 = fieldLength
> "},
> Thanks
> Jay
>
>
>
> 

Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Elizabeth Haubert
Hello,

To a certain extent, I agree with Eric, that this isn't a problem, but
looks like one.  The nature of TF*IDF is such that you will see different
scores for the same query over time on the same replica, or different
replicas for the same query with most replication schemes. This is mildly
annoying when the score is displayed to the user, although I have found
most end users do not pay that much attention to the floating point score.
Testers do.  On a small index with high write/delete traffic and homogenous
docs, I've seen it cause document re-orderings when the same query is
repeated and sent to different replicas such as for paging, and that is
noticeable to end users.

How big is your index, and how different are the percentages you are
seeing?  This is a much more pronounced problem on smaller indices; it is
possible this is a problem with your test setup, but not production.

Your solution at directing users to a consistent replica will solve the
change in values over a session-sized window of time.   With a single
shard, you could use a Master/Slave setup, direct queries at a given
slave.  This has a number of operational consequences though, as it means
you will lose the benefits of SolrCloud.

Mikhail's suggestion to use ExactStats would be cleaner:
https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_


Elizabeth

On Fri, Jan 11, 2019 at 3:56 AM Ashish Bisht 
wrote:

> Hi Erick,
>
> Your statement "*At best, I've seen UIs where they display, say, 1 to 5
> stars that are just showing the percentile that the particular doc had
> _relative to the max score*"  is something we are trying to achieve,but we
> are dealing in percentages rather stars(ratings)
>
> Change in MaxScore per node is messing it.
>
> I was thinking if it possible to make one complete request(for a term) go
> though one replica,i.e if to the client we could tell which replica hit the
> first request and subsequently further paginated requests should go though
> that replica until keyword is changed.Do you think it is possible or a good
> idea?If yes is there a way in solr to know which replica served request?
>
> Regards
> Ashish
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Integrate nutch with solr

2018-10-23 Thread Elizabeth Haubert
Hi Dinesh,

This article
 is
quite old (Nutch 1.x, Solr 4.x), but the high-level steps are still pretty
much the same: get your java set up, kick off a Solr
, and then
fire off your crawler.

If you are starting from scratch on both Solr and Nutch, I'd recommend
getting your Solr sandbox set up first.  The directions for setting up your
Solr collection up are not specific to Nutch, and will be in the Solr
documentation.  The directions for setting up your crawler will be in the
Nutch documentation.

Good luck!
Elizabeth





On Thu, Oct 18, 2018 at 2:36 PM Dinesh Sundaram 
wrote:

> Hi Team,
> Can you please share the steps to integrate nutch 2.3.1 with solrcloud
> 7.1.0.
>
>
> Thanks,
> Dinesh Sundaram
>


Re: Weird behavioural differences between pf in dismax and edismax

2018-05-29 Thread Elizabeth Haubert
That would make sense.
Multi-term synonyms get into a weird case too.  Should the single-term
words that have multi-term synonyms expand out? Or should the multi-term
synonyms that have single-term synonyms contract down and count as only a
single clause for pf2 or pf3.



On Tue, May 29, 2018 at 1:37 PM, Alessandro Benedetti 
wrote:

> I don't have any hard position on this, It's ok to not build a phrase boost
> if the input query is 1 term and it remains one term after the analysis for
> one of the pf fields.
>
> But if the term produces multiple tokens after query time analysis, I do
> believe that building a phrase boost should be the correct interpretation (
> e.g. wi-fi with a query time analiser which split by - ) .
>
> Cheers
>
>
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Weird behavioural differences between pf in dismax and edismax

2018-05-29 Thread Elizabeth Haubert
I disagree that a phrase of 1-word is just a phrase.  That is the core
difference between the qf and pf clauses.  Qf is collecting the terms; pf
is boosting the combinations.

For queries where the original query phrase has only a single term in it,
then it might be a moot point, unless the clauses are being pointed at
different fields or different boosts.

But for multi-term queries, pf (and pf2 and pf3) can be important
differentiators between documents that just happen to have enough words
from the user's original query, and documents that get closer to the user's
meaning.It balances the documents that have enough terms per mm and
documents that have enough terms in one field.

Elizabeth Haubert






On Tue, May 29, 2018 at 5:14 AM, Alessandro Benedetti 
wrote:

> In my opinion, given the definition of dismax and edismax query parsers,
> they
> should behave the same for parameters in common.
> To be a little bit extreme I don't think we need the dismax query parser at
> all anymore ( in the the end edismax  is only offering more than the
> dismax)
>
> Finally, I do believe that even if the query is a single term ( before or
> after the analysis for a PF field) it should anyway boost the phrase.
> A phrase of 1 word is still a phrase, isn't it ?
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?

2018-04-19 Thread Elizabeth Haubert
An update on this:

The problem occurs on phrase queries, using edismax, where the term in the
nested query contains a multi-word synonym.
In the example above,  dog has a multiterm synonym "canis familiaris", and
aspirin has "acetylsalicylic acid".

Creating a JIRA ticket.

Thank you,
Elizabeth


On Wed, Apr 18, 2018 at 12:38 PM, Elizabeth Haubert <
ehaub...@opensourceconnections.com> wrote:

> I'm seeing pf and pf3 clauses fail to generate in long queries containing
> synonyms.  Wondering if anyone else has run into this, or if it needs to be
> submitted as a bug in Jira.   It is a showstopper problem for the current
> project, as the pf and pf3 were pretty heavily tuned.
>
> Using Solr 7.1; all fields are using the following type:
>
> With query-time synonyms:
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>  protected="protwords_nostem.txt"/>
> 
> 
> 
>   
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>   managed="synonyms_all" />
>  protected="protwords_nostem.txt"/>
> 
> 
> 
> 
>
> Without query-time synonyms:
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>   managed="synonyms_all" />
>  protected="protwords_nostem.txt"/>
> 
> 
> 
>   
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>  protected="protwords_nostem.txt"/>
> 
> 
> 
> 
>
> Synonyms file is pretty long, so I'll just include the relevent bits for
> an example:
>
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
>
>
> The problem seems to occur when part of the query has a synonym, but the
> whole phrase is not.  Whitespace added to piece out what is going on;
> believe any parentheses errors are due to my tinkering around.  Beyond that
> though, this is as from Solr.  Slop has been tinkered with to identify
> PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
> 1, pf3 ending in 2 eg ~10, ~11, ~12, etc.
>
> =
> Example 1:  "aspirin dose in rats"
> ==
>
> With query-time synonyms:
> ===
> /// Q terms generate as expected ///
> +kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
> (species:\"acetylsalicylic acid\" species:aspirin) |
> (keywords_bm25_no_norms:\"acetylsalicylic acid\" 
> keywords_bm25_no_norms:aspirin)^50.0
> | (description:\"acetylsalicylic acid\" description:aspirin) |
> (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 |
> (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic
> acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic
> acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 |
> (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4
> ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 |
> Synonym(species:d

PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?

2018-04-18 Thread Elizabeth Haubert
I'm seeing pf and pf3 clauses fail to generate in long queries containing
synonyms.  Wondering if anyone else has run into this, or if it needs to be
submitted as a bug in Jira.   It is a showstopper problem for the current
project, as the pf and pf3 were pretty heavily tuned.

Using Solr 7.1; all fields are using the following type:

With query-time synonyms:














  
















Without query-time synonyms:















  















Synonyms file is pretty long, so I'll just include the relevent bits for an
example:

allergic, hypersensitive
aspirin, acetylsalicylic acid
dog, canine, canis familiris, k 9
rat, rattus


The problem seems to occur when part of the query has a synonym, but the
whole phrase is not.  Whitespace added to piece out what is going on;
believe any parentheses errors are due to my tinkering around.  Beyond that
though, this is as from Solr.  Slop has been tinkered with to identify
PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
1, pf3 ending in 2 eg ~10, ~11, ~12, etc.

=
Example 1:  "aspirin dose in rats"
==

With query-time synonyms:
===
/// Q terms generate as expected ///
+kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
(species:\"acetylsalicylic acid\" species:aspirin) |
(keywords_bm25_no_norms:\"acetylsalicylic acid\"
keywords_bm25_no_norms:aspirin)^50.0 | (description:\"acetylsalicylic
acid\" description:aspirin) | (kw1ranked:\"acetylsalicylic acid\"
kw1ranked:aspirin)^100.0 | (text:\"acetylsalicylic acid\" text:aspirin) |
(title:\"acetylsalicylic acid\" title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:\"acetylsalicylic acid\"
keywordsranked_bm25_no_norms:aspirin)^50.0 | (authors:\"acetylsalicylic
acid\" authors:aspirin))~0.4 ((Synonym(kw1:dosage kw1:dose kw1:dose
kw1:dose))^100.0 | Synonym(species:dosage species:dose species:dose
species:dose) | (Synonym(keywords_bm25_no_norms:dosage
keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose
keywords_bm25_no_norms:dose))^50.0 | Synonym(description:dosage
description:dose description:dose description:dose) |
(Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose
kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose)
| (Synonym(title:dosage title:dose title:dose title:dose))^100.0 |
(Synonym(keywordsranked_bm25_no_norms:dosage
keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose
keywordsranked_bm25_no_norms:dose))^50.0 | Synonym(authors:dosage
authors:dose authors:dose authors:dose))~0.4 ((Synonym(kw1:rat
kw1:rattu))^100.0 | Synonym(species:rat species:rattu) |
(Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 |
Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat
kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat
title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat
keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat
authors:rattu))~0.4)~3)

/// PF and PF2 are missing. ///
 () () () () ()

/// This is actually PF3 with a missing ? where the stopword 'in' belonged.
///
 ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu
rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu
rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose)
(rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(14560),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",

With index-time synonyms:
===

/// Q ///
 "boost(+kw1:aspirin)^100.0 | species:aspirin |
(keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin |
(kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4
((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 |
description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0
| (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4
((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 |
description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0
| (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3)
/// PF  ///
  ((title:\"aspirin dose ? rats\"~20)^5000.0 |
(keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 |
(keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 |
(text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ?
rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4
((authors:\"aspirin dose ? rats\")^250.0 | description:\"aspirin dose ?
rats\")~0.4

/// PF2 ///
  ((text:\"aspirin dose ? rats\"~100)^500.0)~0.4 (authors:\"aspirin
dose\"~11 | species:\"aspirin dose\"~11)~0.4

/// PF3 ///
(((title:\"aspirin dose\"~22)^1000.0 |

dataimporthandler ignoring configured timezone for indexStartTime?

2018-03-02 Thread Elizabeth Haubert
I'm getting the incorrect the reported time deltas on the admin console for
"indexing since" and "started". It looks like DIH is converting the last
start time to UTC:

Last Update: 09:57:15

Indexing completed. Added/Updated: 94078 documents. Deleted 0 documents.
(Duration: 06s)

Requests: 1 , Fetched: 94,078 15,680/s, Skipped: 0 , Processed: 94,078
15,680/s

Started: about 5 hours ago


Server is configured for the EST timezone.

Timezone is set in solr.in.sh:
# By default the start script uses UTC; override the timezone if needed
SOLR_TIMEZONE="EST"

DIH propertywriter specifies timezone in date format:


And timezone is actually being written out in dataimport.properties:
#Fri Mar 02 09:55:11 EST 2018
last_index_time=2018-03-02 09\:55\:06 EST
autosuggest.last_index_time=2018-03-02 09\:55\:06 EST


The code in DataImporter.doc looks like it is pulling the starttime
directly from the PropertyWriter,

so I'm a little stuck what else needs to be configured here.


  public void doFullImport(DIHWriter writer, RequestInfo requestParams) {

LOG.info("Starting Full Import");

setStatus(Status.RUNNING_FULL_DUMP);

try {

  DIHProperties dihPropWriter = createPropertyWriter();

  setIndexStartTime(dihPropWriter.getCurrentTimestamp());



Suggestions?


Thank you,

Elizabeth


Expected response from master when replication is disabled?

2018-02-02 Thread Elizabeth Haubert
What is the intended use case and behavior of disablereplication?

My expectation was that it would be to cause the master to not respond to
 slave requests.

Working with a master/slave Solr 7.1 setup; slave is set up to poll every
60s. I would prefer to remain on M/S for immediate future, while we are
resolving instability in the ingestion pipeline external to Solr, which
causes bad data to be ingested.

Testing failure scenarios related to data corruption, disabled replication
at the master:
curl http://
$SOLR_NODE/solr/$COLLECTION/replication?command=disablereplication

But did not disable polling at the slaves.

2018-02-02 14:05:13.775 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's generation: 0
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's version: 0
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's generation: 12254
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's version: 1517493295318
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher New index in Master. Deleting mine...

And the slave did indeed delete all its documents.
Master's generation at the time was also 12254.

Re-enabled replication at the master, slave caught back up again :
Normal behavior when master and slave are in sync, the slave polls:
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's generation: 12258
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's version: 1517595712709
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's generation: 12258
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's version: 1517595712709

Things that delete the index are big glaring problems, but I'm not clear if
this is a Solr bug or user error.

Liz