Re: Highlighting values of non stored fields

2020-06-14 Thread mosh bla



Thanks Erick, indeed that was my problem and you helped me understand how hl 
component works, but still I cant understand how can I avoid storing all 
field’s variations? For example, if I need to support morphological search, I 
have 2 fields:



Say we indexed the following doc:
{
    “doc_text”: “walking dead”
}
Following queries should match:
q = walking
q = walk
I am issuing edismax query with qf="doc_text^2 doc_text_morph” (boosts are 
currently missing) and add highlight params. ‘walk’ will be matched on 
doc_text_morph, but will only be highlighted iff doc_text_morph is stored (no 
match on stored field doc_text...). Is there any way to make it highlighted 
without also storing doc_text_morph field?
Thanks again...
 
 
 
 

Sent: Monday, June 08, 2020 at 3:39 PM
From: "Erick Erickson" 
To: solr-user@lucene.apache.org
Subject: Re: Highlighting values of non stored fields
When highlighting, the stored data for the field is re-analyzed against the 
query based on the field you’re highlighting. My bet is that if you query just 
“q=doc_text:mosh” you will not get a hit. Check your text_ws fieldType, it’s 
probably case sensitive. So if you changed the doc_text type to text_general 
(the same as your dynamic field), I think you’d be fine. re-index your data of 
course….

I’ll add by-the-by that text_ws is a fairly restricted, and is rarely useful 
for searching on anything humans have to key in. It’ll include punctuation for 
instance, i.e. input like “dog dog.” will produce two tokens, one with a period 
in the token and one without. It’s most useful for heavily-preprocessed data 
where the app normalizes the input or machine-generated input.

There’s no reason, BTW, to index your doc_text for highlighting purposes since 
the stored data is what counts. Unless, of course, you want to search on that 
field specifically.

Best,
Erick

> On Jun 7, 2020, at 11:32 PM, mosh bla  wrote:
>
>
> Thanks Erick for the reply. Your answer is eaxctly what I was expecting from 
> the highlight component but it seems like I am getting different behaviour.
> I'll try to give a simple example and I hope you can explain where is my 
> mistake.
> Say I have the following fields configuration:
> 
> 
> 
>
> And I indexed the following document:
> {
> "doc_text": "MOSH"
> }
>
> When executing the following query 
> "http://.../select?q=doc_text_lw:mosh&hl=true&hl.fl=doc_text"; - the document 
> is matched and returned in response, but the highlighed fragment is empty.
> I also tried to change 'hl.method' param to 'unified' and 'fastVector' but no 
> luck either. My conclusion was that 'hl.fl' param should be set to 
> 'doc_text_lw' and it must be also stored...
>
>
>
>
> Sent: Tuesday, June 02, 2020 at 3:15 PM
> From: "Erick Erickson" 
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting values of non stored fields
> Why do you think even variants need to be stored/highlighted? Usually
> when you store variants for ranking purposes those extra copies are
> invisible to the user. So most often people store exactly one copy
> of a particular field and highlight _that_ field in the return.
>
> So say my field is f1 and I have indexed f1_1, f1_2, f1_3. I just store
> f1_1 and return the highlighted text from that one.
>
> You could even just stored the data only once in a field that’s never
> indexed and return/highlight that if you wanted.
>
> Best,
> Erick
>
>> On Jun 2, 2020, at 3:24 AM, mosheB  wrote:
>>
>> Our use case is as follow:
>> We are indexing free text documents. Each document contains metadata fields
>> (such as author, creation date...) which are kinda small, and one "big"
>> field that holds the document's text itself.
>>
>> For ranking purpose each field is indexed in more then one "variation" and
>> query is executed with edismax query parser. Things are working alright, but
>> now a new feature is requested by the customer - highlighting.
>> To enable highlighting every field must be stored, including all variations
>> of the big text field. This pushes our storage to the limit (and probably
>> the document cache...) and feels a bit redundant, as the stored value is
>> duplicated n times... Is there any way to “reference” stored value from one
>> field to another?
>> For example:
>> Say we have the following config:
>> > />
>> > />
>>
>> 
>> 
>> 
>>
>> And we execute the following query:
>> http://.../select?defType=edismax&q=desired_terms&qf=doc_text^2
>> doc_text_bigrams^3
>> doc_text_phrases^4&hl=on&hl.fl=doc_text,doc_text_bigrams,doc_text_phrases
>>
>> Highlight fragments in response will be blank if match occurred on the
>> non-stored fields (doc_text_bigrams or doc_text_phrases). Is it possible to
>> pass extra parameter to the highlight component, to point it to the stored
>> data of the “original” doc_text field? a kind of “stored value reference
>> field”?
>>
>> Thanks in advance.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
 


Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-14 Thread Aroop Ganguly
Isabele sometime 401’s are a red herring for other issues un related to auth.
We have had issues on 7.7 where an underlying transient replica recovery and/or 
leader down situation where the only message we got back from Solr was a 401.
Please see if u have any down replicas or other issues where certain nodes may 
have trouble getting more current information from zookeeper.


> On Jun 14, 2020, at 2:13 PM, Isabelle Giguere  > wrote:
> 
> I have created https://issues.apache.org/jira/browse/SOLR-14569 
> 
> It includes a patch with the unit test to reproduce the issue, and a 
> simplification of our product-specific configuration, with instructions.
> 
> Let's catch up on Jira.
> 
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
> 
> 
> 
> De : Jan Høydahl mailto:jan@cominvent.com>>
> Envoyé : 13 juin 2020 17:50
> À : solr-user  >
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
> 
> I did not manage to reproduce. Feel free to open the JIRA and attach the 
> failing test. In the issue description, it is great if you manage to describe 
> the reproduction steps in a clean way, so anyone can reproduce with a minimal 
> neccessary config.
> 
> Jan
> 
>> 13. jun. 2020 kl. 00:41 skrev Isabelle Giguere 
>> mailto:igigu...@opentext.com.INVALID>>:
>> 
>> Hello again;
>> 
>> I have managed to reproduce the issue in a unit test.  I should probably add 
>> a Jira ticket with a patch for the unit test On Solr 8.5.0, not master.
>> 
>> Meanwhile, for your suggested queries:
>> 
>> 1.  Query on the collection:
>> 
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*&wt=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
>>  
>> 
>> HTTP/1.1 200 OK
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
>> 'self'; worker-src 'self';
>> X-Content-Type-Options: nosniff
>> X-Frame-Options: SAMEORIGIN
>> X-XSS-Protection: 1; mode=block
>> Content-Type: application/xml; charset=UTF-8
>> Content-Length: 8214
>> 
>> 
>> 
>> 
>> 
>> true
>> 0
>> 2
>> 
>>   *:*
>> 
>> 
>> 
>> Response contains the Solr document, of course
>> 
>> 
>> 2. Query on the alias
>> 
>> curl -i -u admin:admin 
>> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*&wt=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$
>>  
>> >  
>> 
>>  >
>> HTTP/1.1 401 Unauthorized
>> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
>> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
>> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
>> 'self'; worker-src 'self';
>> X-Content-Type-Options: nosniff
>> X-Frame-Options: SAMEORIGIN
>> X-XSS-Protection: 1; mode=block
>> Cache-Control: no-cache, no-store
>> Pragma: no-cache
>> Expires: Sat, 01 Jan 2000 01:00:00 GMT
>> Last-Modified: Fri, 12 Jun 2020 22:30:20 GMT
>> ETag: "172aaa7c1eb"
>> Content-Type: application/xml; charset=UTF-8
>> Content-Length: 1332
>> 
>> 
>> 
>> 
>> 
>> true
>> 401
>> 16
>> 
>>   *:*
>> 
>> 
>> 
>> Error contains the full html HTTP 401 message (with escaped characters, of 
>> course)
>> Gist of it : HTTP ERROR 401 require authentication
>> 
>> Thanks;
>> 
>> 
>> Isabelle Giguère
>> Computational Linguist & Java Developer
>> Linguiste informaticienne & développeur java
>> 
>> 
>> 
>> De : Jan Høydahl mailto:jan@cominvent.com>>
>> Envoyé : 12 juin 2020 17:30
>> À : solr-user@lucene.apache.org  
>> mailto:solr-user@lucene.apache.org>>
>> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>> 
>> I’d say, try the query with curl and enable http headers
>> 
>> curl -i —user admin:admin 
>> http://localhost:8983/solr/mycollection/select?q=*:* 
>> 

Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

2020-06-14 Thread Isabelle Giguere
I have created https://issues.apache.org/jira/browse/SOLR-14569
It includes a patch with the unit test to reproduce the issue, and a 
simplification of our product-specific configuration, with instructions.

Let's catch up on Jira.

Isabelle Giguère
Computational Linguist & Java Developer
Linguiste informaticienne & développeur java



De : Jan Høydahl 
Envoyé : 13 juin 2020 17:50
À : solr-user 
Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr

I did not manage to reproduce. Feel free to open the JIRA and attach the 
failing test. In the issue description, it is great if you manage to describe 
the reproduction steps in a clean way, so anyone can reproduce with a minimal 
neccessary config.

Jan

> 13. jun. 2020 kl. 00:41 skrev Isabelle Giguere 
> :
>
> Hello again;
>
> I have managed to reproduce the issue in a unit test.  I should probably add 
> a Jira ticket with a patch for the unit test On Solr 8.5.0, not master.
>
> Meanwhile, for your suggested queries:
>
>  1.  Query on the collection:
>
> curl -i -u admin:admin 
> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test1/select?q=*:*&wt=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7JzikWgk$
> HTTP/1.1 200 OK
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Content-Type: application/xml; charset=UTF-8
> Content-Length: 8214
>
> 
> 
>
> 
>  true
>  0
>  2
>  
>*:*
>  
> 
> 
> Response contains the Solr document, of course
>
>
> 2. Query on the alias
>
> curl -i -u admin:admin 
> https://urldefense.com/v3/__http://10.5.106.115:8985/solr/test/select?q=*:*&wt=xml__;Kio!!Obbck6kTJA!LvZRdkAwPGTDqWqS-BYMmyuuwAp9coGzkDzz5BG7hTCLmCSV2bOZBM9A7PZyiHWo$
>  
>   >
> HTTP/1.1 401 Unauthorized
> Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 
> 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
> 'self'; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
> 'self'; worker-src 'self';
> X-Content-Type-Options: nosniff
> X-Frame-Options: SAMEORIGIN
> X-XSS-Protection: 1; mode=block
> Cache-Control: no-cache, no-store
> Pragma: no-cache
> Expires: Sat, 01 Jan 2000 01:00:00 GMT
> Last-Modified: Fri, 12 Jun 2020 22:30:20 GMT
> ETag: "172aaa7c1eb"
> Content-Type: application/xml; charset=UTF-8
> Content-Length: 1332
>
> 
> 
>
> 
>  true
>  401
>  16
>  
>*:*
>  
> 
> 
> Error contains the full html HTTP 401 message (with escaped characters, of 
> course)
> Gist of it : HTTP ERROR 401 require authentication
>
> Thanks;
>
>
> Isabelle Giguère
> Computational Linguist & Java Developer
> Linguiste informaticienne & développeur java
>
>
> 
> De : Jan Høydahl 
> Envoyé : 12 juin 2020 17:30
> À : solr-user@lucene.apache.org 
> Objet : Re: [EXTERNAL] - Re: HTTP 401 when searching on alias in secured Solr
>
> I’d say, try the query with curl and enable http headers
>
> curl -i —user admin:admin http://localhost:8983/solr/mycollection/select?q=*:*
> curl -i —user admin:admin http://localhost:8983/solr/myalias/select?q=*:*
>
> Are you saying that you see a difference between the two? What are the 
> headers?
>
> Jan
>
>> 12. jun. 2020 kl. 20:06 skrev Isabelle Giguere 
>> :
>>
>> Hi Jan
>>
>> Thank you for your time on this.
>>
>> If I send a /select request directly on the alias (/solr/test/select), the 
>> browser asks for credentials, but the Solr response returns status=401 and 
>> an html error message with "HTTP ERROR 401 require authentication"
>>
>> Obviously, my expectation was that some query results would be returned.
>>
>> Since you can't reproduce the issue, I have to assume it's a configuration 
>> issue.
>>
>> So, if I may, let me provide as much details as I can about my setup.
>>
>> Can anyone see something wrong here, some incompatibility ?
>>
>> Solr 8.5.0
>>
>> solrconfig.xml
>> 7.1.0
>> 
>> 
>> 
>> 
>>   
>>   5
>>   5
>>   5
>>   
>>
>> schema.xml
>> version=1.6
>> Some warnings on start-up about Trie* fields and deprecated filters (we 
>> should fix that)
>>
>> security.json in Zookeeper, at the Solr ZK root (provided on this thread)
>> blockUnknown : (true|false) = no change in behavior for me, for this issue
>> forwardCredentials : (true|false) = no change in behavior for me, for this 
>> issue
>>
>> No SSL
>>
>> solr.in.sh
>> SOLR_AUTH_TYPE="basic"
>> SOLR_AUTHENTICATION_OPTS="-Dbasicauth=admin:admin"
>>
>> start command params:
>> solr 

Re: How to determine why solr stops running?

2020-06-14 Thread Ryan W
Thank you.  I pasted those settings at the end of my /etc/default/solr.in.sh
just now and restarted solr.  I will see if that fixes it.  Previously, I
had no settings at all in solr.in.sh except for SOLR_PORT.

On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood 
wrote:

> 1. You have a tiny heap. 536 Megabytes is not enough.
> 2. I stopped using the CMS GC years ago.
>
> Here is the GC config we use on every one of our 150+ Solr hosts. We’re
> still on Java 8, but will be upgrading soon.
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jun 11, 2020, at 10:52 AM, Ryan W  wrote:
> >
> > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen  wrote:
> >
> >> I will check "dmesg" first, to find out any hardware error message.
> >>
> >
> > Here is what I see toward the end of the output from dmesg:
> >
> > [1521232.781785] [118857]48 118857   108785  677 201
> > 901 0 httpd
> > [1521232.781787] [118860]48 118860   108785  710 201
> > 881 0 httpd
> > [1521232.781788] [118862]48 118862   113063 5256 210
> > 725 0 httpd
> > [1521232.781790] [118864]48 118864   114085 6634 212
> > 703 0 httpd
> > [1521232.781791] [118871]48 118871   13968732323 262
> > 620 0 httpd
> > [1521232.781793] [118873]48 118873   108785  821 201
> > 792 0 httpd
> > [1521232.781795] [118879]48 118879   14026332719 263
> > 621 0 httpd
> > [1521232.781796] [118903]48 118903   108785  812 201
> > 771 0 httpd
> > [1521232.781798] [118905]48 118905   113575 5606 211
> > 660 0 httpd
> > [1521232.781800] [118906]48 118906   113563 5694 211
> > 626 0 httpd
> > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
> > sacrifice child
> > [1521232.782908] Killed process 117529 (httpd), UID 48,
> total-vm:675824kB,
> > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> >
> > Is this a relevant "Out of memory" message?  Does this suggest an OOM
> > situation is the culprit?
> >
> > When I grep in the solr logs for oom, I see some entries like this...
> >
> > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
> > -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
> > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
> > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
> > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
> > -XX:-OmitStackTraceInFastThrow
> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> /opt/solr/server/logs
> > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
> > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
> > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
> > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
> > -XX:+UseParNewGC
> >
> > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
> But I
> > think this is just a setting that indicates what to do in case of an OOM.
> > And if I look in that oom_solr.sh file, I see it would write an entry to
> a
> > solr_oom_kill log. And there is no such log in the logs directory.
> >
> > Many thanks.
> >
> >
> >
> >
> >> Then use some system admin tools to monitor that server,
> >> for instance, top, vmstat, lsof, iostat ... or simply install some nice
> >> free monitoring tool into this system, like monit, monitorix, nagios.
> >> Good luck!
> >>
> >> 
> >> From: Ryan W 
> >> Sent: Thursday, June 11, 2020 2:13 AM
> >> To: solr-user@lucene.apache.org 
> >> Subject: Re: How to determine why solr stops running?
> >>
> >> Hi all,
> >>
> >> People keep suggesting I check the logs for errors.  What do those
> errors
> >> look like?  Does anyone have examples of the text of a Solr oom error?
> Or
> >> the text of any other errors I should be looking for the next time solr
> >> fails?  Are there phrases I should grep for in the logs?  Should I be
> >> looking in the Solr logs for an OOM error, or in the Apache logs?
> >>
> >> There is nothing failing on the server except for solr -- at least not
> that
> >> I can see.  There is no apparent probl