Re: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-04 Thread Ere Maijala

Hi,

Solr uses JIRA for issue tickets. You can find it here: 
https://issues.apache.org/jira/browse/SOLR


I'd suggest filing a new bug issue in the SOLR project (note that 
several other projects also use this JIRA installation). Here's an 
example of an existing highlighter issue for reference: 
https://issues.apache.org/jira/browse/SOLR-14019.


See also some brief documentation:

https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker)

Regards,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.58:

Hi Ere

Please to be of service!

No I have not filed a JIRA ticket. I am new to interacting with the Solr
Community and only beginning to 'find my legs'. I am not too sure what JIRA
is I am afraid!

Regards

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.



-Original Message-
From: Ere Maijala 
Sent: 01 March 2021 12:53
To: solr-user@lucene.apache.org
Subject: Re: Potential Slow searching for unified highlighting on Solr
8.8.0/8.8.1

EXTERNAL EMAIL - Be cautious of all links and attachments.

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole lot
of trouble. Did you file a JIRA ticket already?

Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:

Hi There

I just came across a situation where a unified highlighting search
under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times

out.

I resolved it by a config change – but it can catch you out. Hence
this email.

With solr 8.8.0 a new unified highlighting parameter
 was implemented which if not set defaults to 0.5.
This attempts to improve the high lighting so that highlighted text
does not appear right at the left. This works well but if you have a
search result with numerous occurrences of the word in question within
the record performance goes right down!

2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select
params={hl.snippets=2=test=on=100=id,d
escription,specification,score=20=*=10&_=161440511913
4}
hits=57008 status=0 QTime=1414320

2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.s.HttpSolrCall Unable to write response, client closed
connection or we are shutting down =>
org.eclipse.jetty.io.EofException

at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)

org.eclipse.jetty.io.EofException: null

at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

at
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

at
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

when I set =0.25 results came back much quicker

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.25=100=2=test
axAnalyzedChars=100=*=unified=9&_=
1614430061690}
hits=136939 status=0 QTime=87024

And  =0.1

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.1=100=2=test
xAnalyzedChars=100=*=unified=9&_=1
614430061690}
hits=136939 status=0 QTime=69033

And =0.0

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,s
core=1=0.0=100=2=test
xAnalyzedChars=100=*=unified=9&_=1
614430061690}
hits=136939 status=0 QTime=2841

I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully
left aligned).  I am not too sure as to how many time a word has to
occur in a record for performance to go right down – but if too many
it can have a BIG impact.

I also noticed that setting =9 did not break out of
the query until it finished. Perhaps because the query finished
quickly and what took the time was the highlighting. It might be an
idea to get  to also cover any highlighting so that the
query does not run until the jetty timeout is hit. The machine 100%
one core for about
20 mins!.

Hope this helps.

Regards

Matthew

*Matthew Flowerday*| Consultant | ULEAF

Unisys | 01908 774830| matthew.flower...@unisys.com
<mailto:matthew.flower...@unisys.com>

Address Enigma | Wavendon Business Park |

RE: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Flowerday, Matthew J
Hi Ere

Please to be of service!

No I have not filed a JIRA ticket. I am new to interacting with the Solr
Community and only beginning to 'find my legs'. I am not too sure what JIRA
is I am afraid!

Regards

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com 
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.
   

-Original Message-
From: Ere Maijala  
Sent: 01 March 2021 12:53
To: solr-user@lucene.apache.org
Subject: Re: Potential Slow searching for unified highlighting on Solr
8.8.0/8.8.1

EXTERNAL EMAIL - Be cautious of all links and attachments.

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole lot
of trouble. Did you file a JIRA ticket already?

Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:
> Hi There
>
> I just came across a situation where a unified highlighting search 
> under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times
out.
> I resolved it by a config change – but it can catch you out. Hence 
> this email.
>
> With solr 8.8.0 a new unified highlighting parameter 
>  was implemented which if not set defaults to 0.5. 
> This attempts to improve the high lighting so that highlighted text 
> does not appear right at the left. This works well but if you have a 
> search result with numerous occurrences of the word in question within 
> the record performance goes right down!
>
> 2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
> params={hl.snippets=2=test=on=100=id,d
> escription,specification,score=20=*=10&_=161440511913
> 4}
> hits=57008 status=0 QTime=1414320
>
> 2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.s.HttpSolrCall Unable to write response, client closed 
> connection or we are shutting down => 
> org.eclipse.jetty.io.EofException
>
>at
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
>
> org.eclipse.jetty.io.EofException: null
>
>at
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
>at
> org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
>at
> org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>
> when I set =0.25 results came back much quicker
>
> 2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.25=100=2=test
> axAnalyzedChars=100=*=unified=9&_=
> 1614430061690}
> hits=136939 status=0 QTime=87024
>
> And  =0.1
>
> 2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.1=100=2=test
> xAnalyzedChars=100=*=unified=9&_=1
> 614430061690}
> hits=136939 status=0 QTime=69033
>
> And =0.0
>
> 2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params={hl.weightMatches=false=on=id,description,specification,s
> core=1=0.0=100=2=test
> xAnalyzedChars=100=*=unified=9&_=1
> 614430061690}
> hits=136939 status=0 QTime=2841
>
> I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully 
> left aligned).  I am not too sure as to how many time a word has to 
> occur in a record for performance to go right down – but if too many 
> it can have a BIG impact.
>
> I also noticed that setting =9 did not break out of 
> the query until it finished. Perhaps because the query finished 
> quickly and what took the time was the highlighting. It might be an 
> idea to get  to also cover any highlighting so that the 
> query does not run until the jetty timeout is hit. The machine 100% 
> one core for about
> 20 mins!.
>
> Hope this helps.
>
> Regards
>
> Matthew
>
> *Matthew Flowerday*| Consultant | ULEAF
>
> Unisys | 01908 774830| matthew.flower...@unisys.com 
> <mailto:matthew.flower...@unisys.com>
>
> Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes |
> MK17 8LX
>
> unisys_logo <http://www.

Re: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Ere Maijala

Hi,

Whoa, thanks for the heads-up! You may just have saved me from a whole 
lot of trouble. Did you file a JIRA ticket already?


Thanks,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:

Hi There

I just came across a situation where a unified highlighting search under 
solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times out. 
I resolved it by a config change – but it can catch you out. Hence this 
email.


With solr 8.8.0 a new unified highlighting parameter  
was implemented which if not set defaults to 0.5. This attempts to 
improve the high lighting so that highlighted text does not appear right 
at the left. This works well but if you have a search result with 
numerous occurrences of the word in question within the record 
performance goes right down!


2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
params={hl.snippets=2=test=on=100=id,description,specification,score=20=*=10&_=1614405119134} 
hits=57008 status=0 QTime=1414320


2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
o.a.s.s.HttpSolrCall Unable to write response, client closed connection 
or we are shutting down => org.eclipse.jetty.io.EofException


   at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)


org.eclipse.jetty.io.EofException: null

   at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


   at 
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


   at 
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]


when I set =0.25 results came back much quicker

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.25=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=87024


And  =0.1

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.1=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=69033


And =0.0

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params={hl.weightMatches=false=on=id,description,specification,score=1=0.0=100=2=test=100=*=unified=9&_=1614430061690} 
hits=136939 status=0 QTime=2841


I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully 
left aligned).  I am not too sure as to how many time a word has to 
occur in a record for performance to go right down – but if too many it 
can have a BIG impact.


I also noticed that setting =9 did not break out of the 
query until it finished. Perhaps because the query finished quickly and 
what took the time was the highlighting. It might be an idea to get 
 to also cover any highlighting so that the query does not 
run until the jetty timeout is hit. The machine 100% one core for about 
20 mins!.


Hope this helps.

Regards

Matthew

*Matthew Flowerday*| Consultant | ULEAF

Unisys | 01908 774830| matthew.flower...@unisys.com 
<mailto:matthew.flower...@unisys.com>


Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | 
MK17 8LX


unisys_logo <http://www.unisys.com/>

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all devices.


Grey_LI <http://www.linkedin.com/company/unisys>Grey_TW 
<http://twitter.com/unisyscorp>Grey_YT 
<http://www.youtube.com/theunisyschannel>Grey_FB 
<http://www.facebook.com/unisyscorp>Grey_Vimeo 
<https://vimeo.com/unisys>Grey_UB <http://blogs.unisys.com/>




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-01 Thread Flowerday, Matthew J
Hi There

 

I just came across a situation where a unified highlighting search under
solr 8.8.0/8.8.1 can take over 20 mins to run and eventually times out. I
resolved it by a config change - but it can catch you out. Hence this email.

 

With solr 8.8.0 a new unified highlighting parameter  was
implemented which if not set defaults to 0.5. This attempts to improve the
high lighting so that highlighted text does not appear right at the left.
This works well but if you have a search result with numerous occurrences of
the word in question within the record performance goes right down!

 

2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select
params={hl.snippets=2=test=on=100=id,descrip
tion,specification,score=20=*=10&_=1614405119134}
hits=57008 status=0 QTime=1414320

2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf]
o.a.s.s.HttpSolrCall Unable to write response, client closed connection or
we are shutting down => org.eclipse.jetty.io.EofException

  at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)

org.eclipse.jetty.io.EofException: null

  at
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

  at
org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

  at
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378)
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

 

when I set =0.25 results came back much quicker

 

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,score
tart=1=0.25=100=2=test
ars=100=*=unified=9&_=1614430061690}
hits=136939 status=0 QTime=87024

 

And  =0.1

 

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,score
tart=1=0.1=100=2=test
rs=100=*=unified=9&_=1614430061690}
hits=136939 status=0 QTime=69033

 

And =0.0

 

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes]
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select
params={hl.weightMatches=false=on=id,description,specification,score
tart=1=0.0=100=2=test
rs=100=*=unified=9&_=1614430061690}
hits=136939 status=0 QTime=2841

 

I left our setting at 0.0 - this presumably how it was in 7.7.1 (fully left
aligned).  I am not too sure as to how many time a word has to occur in a
record for performance to go right down - but if too many it can have a BIG
impact.

 

I also noticed that setting =9 did not break out of the
query until it finished. Perhaps because the query finished quickly and what
took the time was the highlighting. It might be an idea to get 
to also cover any highlighting so that the query does not run until the
jetty timeout is hit. The machine 100% one core for about 20 mins!.

 

Hope this helps.

 

Regards

 

Matthew

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  <mailto:matthew.flower...@unisys.com>
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

 <http://www.unisys.com/> 

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 <http://www.linkedin.com/company/unisys><http://twitter.com/unisyscorp>
<http://www.youtube.com/theunisyschannel>
<http://www.facebook.com/unisyscorp>  <https://vimeo.com/unisys>
<http://blogs.unisys.com/> 

 



smime.p7s
Description: S/MIME cryptographic signature


Issue regarding highlighting in BlendedInfixLookupFactory

2021-01-27 Thread Debarshi Das
 Hi Lucene team,

I have been using Solr 8.4.0 and I encountered an issue where suggest
highlight feature in “BlendedInfixLookupFactory” is not working whenever I
am using “contextField”. While searching online, I came across below page
which does indicate a similar issue but for different SOLR version.

https://issues.apache.org/jira/browse/SOLR-7964

Please suggest me how to configure the SOLR or the version under 8.4.x that
might have a patch included to fix this issue, or any other version that
supports both “BlendedInfixLookupFactory” and “contextField” along with the
fix.

Regards,
Debarshi


Re: Exact and non exact highlighting

2021-01-22 Thread David Smiley
I'm very familiar with using the Unifier Highligher on a project with this
requirement.  The main "trick" we used was using only one field but
analyzing both ways with a term differentiator (e.g. a leading symbol), and
then coupled with a custom query parser that knows a phrase query is to be
highlighted using the "exact" analysis as opposed to stemmed/approximate
analysis.  As one can imagine, there was a lot of custom code involved here
for many search requirements; this complexity wasn't just for the
highlighting matter.  Any way, using one stored field and multiple indexed
fields (ignoring their stored content if any) is a known feature request:
https://issues.apache.org/jira/browse/SOLR-1105  There's even a patch.  I
would love to help get this feature into Solr if you want to take-over
there!  The patch needs some work; I really disagree with touching the Solr
schema.  If you are up for it, comment on that issue to let the original
contributor know you want to help move this forward.  Maybe they do too.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 22, 2021 at 12:46 PM df2832368_...@amberoad.de
df2832368_...@amberoad.de  wrote:

> Hello folks,
>
> I am currently working on an issue where we need to enable exact
> highlighting on a text field.
>
> Only problem is that it should also be possible to have also parts of the
> query which don't need to be exact.(e.g. "Hello World" Test, so "Hello
> World" needs to be an exact match, but tests would also match test.)
>
> We have a text field with our normal analyzer pipeline (stemming,...) and
> a copy field which has a decreased pipeline(lowercase filter).
>
> For searching this does its job fine and only returns the correct results
> by translating the query to its supposed fields(e.g. " data-rule="ARROWS"
> data-suggestions="[{"value":"→"},{"value":"⇾"},{"value":"≥"},{"value":"⇉"},{"value":"⇒"},{"value":"⇨"},{"value":"⇛"}]"
> data-type="grammar">-> text_exact:"Hello World" AND text:Test)
>
> Now the problem: The highlighting is now split into the two text fields
> (which makes sense). So we somehow want to combine those two highlights
> (they have the same stored text) to get appropriate "tags" and also scores.
>
> I haven't found a neat solution to this problem by now and would like to
> ask if someone has done something similar or has a clear idea on what to do.
>
> I have tried to tinker a bit around our custom extension of the unified
> highlighter and tried to somehow merge the passages returned by the
> highlighter. But this is quite tedious and error-prone. The next idea was
> to do a two-step process by first getting the positions of the exact match
> in the text_exact field and afterwards somehow filter only highlights that
> have these positions inside. (But I suppose this idea would still not solve
> the "tag"(/) problem .)
>
> I am glad for every help you could offer.
>
> Jan


Exact and non exact highlighting

2021-01-22 Thread df2832368_...@amberoad.de df2832368_...@amberoad.de
Hello folks,

I am currently working on an issue where we need to enable exact highlighting 
on a text field.

Only problem is that it should also be possible to have also parts of the query 
which don't need to be exact.(e.g. "Hello World" Test, so "Hello World" needs 
to be an exact match, but tests would also match test.)

We have a text field with our normal analyzer pipeline (stemming,...) and a 
copy field which has a decreased pipeline(lowercase filter).

For searching this does its job fine and only returns the correct results by 
translating the query to its supposed fields(e.g. " data-rule="ARROWS" 
data-suggestions="[{"value":"→"},{"value":"⇾"},{"value":"≥"},{"value":"⇉"},{"value":"⇒"},{"value":"⇨"},{"value":"⇛"}]"
 data-type="grammar">-> text_exact:"Hello World" AND text:Test)

Now the problem: The highlighting is now split into the two text fields (which 
makes sense). So we somehow want to combine those two highlights (they have the 
same stored text) to get appropriate "tags" and also scores.

I haven't found a neat solution to this problem by now and would like to ask if 
someone has done something similar or has a clear idea on what to do.

I have tried to tinker a bit around our custom extension of the unified 
highlighter and tried to somehow merge the passages returned by the 
highlighter. But this is quite tedious and error-prone. The next idea was to do 
a two-step process by first getting the positions of the exact match in the 
text_exact field and afterwards somehow filter only highlights that have these 
positions inside. (But I suppose this idea would still not solve the 
"tag"(/) problem .)

I am glad for every help you could offer.

Jan

Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
Hi David

Just reindexed everything and it appears to be performing well and giving
me highlights for the matched text.

Thanks for your help.
Shaun

On Tue, 12 Jan 2021, 21:00 David Smiley,  wrote:

> The last update to highlighting that I think is pertinent to
> whether highlights match or not is v7.6 which added that hl.weightMatches
> option.  So I recommend upgrading to at least that if you want to
> experiment further.  But... uh.weightMatches highlights more accurately and
> as such is more likely to not highlight as much as you are highlighting
> now, and highlighting more is your goal right now it appears.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jan 12, 2021 at 2:45 PM Shaun Campbell 
> wrote:
>
> > That's great David.  So hl.maxAnalyzedChars isn't that critical. I'll
> whack
> > it right up and see what happens.
> >
> > I'm running 7.4 from a few years ago. Should I upgrade?
> >
> > For your info this is what I'm doing with Solr
> > https://dev.fundingawards.nihr.ac.uk/search.
> >
> > Thanks
> > Shaun
> >
> > On Tue, 12 Jan 2021 at 19:33, David Smiley  wrote:
> >
> > > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi David
> > > >
> > > > Getting closer now.
> > > >
> > > > First of all, a bit of a mistake on my part. I have two cores set up
> > and
> > > I
> > > > was changing the solrconfig.xml on the wrong core doh!!  That's why
> > > > highlighting wasn't being turned off.
> > > >
> > > > I think I've got the unified highlighter working.
> > > > storeOffsetsWithPositions was already configured on my field type
> > > > definition, not the field definition, so that was ok.
> > > >
> > > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> > > > highlighting on some records and not others, making it confusing as
> to
> > > > where the match is with my dismax parser.  I increased
> > > > my hl.maxAnalyzedChars to 130 and now it's highlighting more
> > records.
> > > > Two questions:
> > > >
> > > > 1. Have you any guidelines as to what could be a
> > > > maximum hl.maxAnalyzedChars without impacting performance or memory?
> > > >
> > >
> > > With storeOffsetsWithPositions, highlighting is super-fast, and so this
> > > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap
> > the
> > > amount of memory used if you have some truly humongous docs and it's
> okay
> > > only highlight the first X megabytes of them.  Maybe set to a 100MB
> worth
> > > of text, or something like that.
> > >
> > >
> > > > 2. Do you know a way to query the maximum length of text in a field
> so
> > > that
> > > > I can set hl.maxAnalyzedChars accordingly?  Just thinking I can
> > probably
> > > > modify my java indexer to log the maximum content length.  Actually,
> I
> > > > probably don't want the maximum but some value that highlights 90-95%
> > > > records
> > > >
> > >
> > > Eh... not really.  Maybe some approximation hacks involving function
> > > queries on norms but I'd not bother in favor of just using a high
> > threshold
> > > such that this won't be an issue.
> > >
> > > All this said, this threshold is *not* the only reason why you might
> not
> > be
> > > getting highlights that you expect.  If you are using a recent Solr
> > > version, you might try toggling the hl.weightMatches boolean, which
> could
> > > make a difference for certain query arrangements.  There's a JIRA issue
> > > pertaining to this one, and I haven't investigated it yet.
> > >
> > > ~ David
> > >
> > >
> > > >
> > > > Thanks
> > > > Shaun
> > > >
> > > > On Tue, 12 Jan 2021 at 16:30, David Smiley 
> wrote:
> > > >
> > > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <
> > > campbell.sh...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi David
> > > > > >
> > > > > > First of all I wanted to say I'm working off your book!!  Third
> > > > edition,
> > > > > > and I thin

Re: Highlighting large text fields

2021-01-12 Thread David Smiley
The last update to highlighting that I think is pertinent to
whether highlights match or not is v7.6 which added that hl.weightMatches
option.  So I recommend upgrading to at least that if you want to
experiment further.  But... uh.weightMatches highlights more accurately and
as such is more likely to not highlight as much as you are highlighting
now, and highlighting more is your goal right now it appears.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jan 12, 2021 at 2:45 PM Shaun Campbell 
wrote:

> That's great David.  So hl.maxAnalyzedChars isn't that critical. I'll whack
> it right up and see what happens.
>
> I'm running 7.4 from a few years ago. Should I upgrade?
>
> For your info this is what I'm doing with Solr
> https://dev.fundingawards.nihr.ac.uk/search.
>
> Thanks
> Shaun
>
> On Tue, 12 Jan 2021 at 19:33, David Smiley  wrote:
>
> > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell  >
> > wrote:
> >
> > > Hi David
> > >
> > > Getting closer now.
> > >
> > > First of all, a bit of a mistake on my part. I have two cores set up
> and
> > I
> > > was changing the solrconfig.xml on the wrong core doh!!  That's why
> > > highlighting wasn't being turned off.
> > >
> > > I think I've got the unified highlighter working.
> > > storeOffsetsWithPositions was already configured on my field type
> > > definition, not the field definition, so that was ok.
> > >
> > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> > > highlighting on some records and not others, making it confusing as to
> > > where the match is with my dismax parser.  I increased
> > > my hl.maxAnalyzedChars to 130 and now it's highlighting more
> records.
> > > Two questions:
> > >
> > > 1. Have you any guidelines as to what could be a
> > > maximum hl.maxAnalyzedChars without impacting performance or memory?
> > >
> >
> > With storeOffsetsWithPositions, highlighting is super-fast, and so this
> > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap
> the
> > amount of memory used if you have some truly humongous docs and it's okay
> > only highlight the first X megabytes of them.  Maybe set to a 100MB worth
> > of text, or something like that.
> >
> >
> > > 2. Do you know a way to query the maximum length of text in a field so
> > that
> > > I can set hl.maxAnalyzedChars accordingly?  Just thinking I can
> probably
> > > modify my java indexer to log the maximum content length.  Actually, I
> > > probably don't want the maximum but some value that highlights 90-95%
> > > records
> > >
> >
> > Eh... not really.  Maybe some approximation hacks involving function
> > queries on norms but I'd not bother in favor of just using a high
> threshold
> > such that this won't be an issue.
> >
> > All this said, this threshold is *not* the only reason why you might not
> be
> > getting highlights that you expect.  If you are using a recent Solr
> > version, you might try toggling the hl.weightMatches boolean, which could
> > make a difference for certain query arrangements.  There's a JIRA issue
> > pertaining to this one, and I haven't investigated it yet.
> >
> > ~ David
> >
> >
> > >
> > > Thanks
> > > Shaun
> > >
> > > On Tue, 12 Jan 2021 at 16:30, David Smiley  wrote:
> > >
> > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <
> > campbell.sh...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi David
> > > > >
> > > > > First of all I wanted to say I'm working off your book!!  Third
> > > edition,
> > > > > and I think it's a bit out of date now. I was just going to try
> > > following
> > > > > the section on the Postings highlighter, but I see that's been
> > absorbed
> > > > > into the Unified highlighter. I find your book easier to follow
> than
> > > the
> > > > > official documentation though.
> > > > >
> > > >
> > > > Thanks :-D.  I do maintain the Solr Reference Guide for the parts of
> > > code I
> > > > touch, including highlighting, so I hope what's there makes sense
> too.
> > > >
> > > >
> > > > > I am going to try to configure the unified highlighter, and I will
> > add
> > > > that
> > > > &g

Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
That's great David.  So hl.maxAnalyzedChars isn't that critical. I'll whack
it right up and see what happens.

I'm running 7.4 from a few years ago. Should I upgrade?

For your info this is what I'm doing with Solr
https://dev.fundingawards.nihr.ac.uk/search.

Thanks
Shaun

On Tue, 12 Jan 2021 at 19:33, David Smiley  wrote:

> On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell 
> wrote:
>
> > Hi David
> >
> > Getting closer now.
> >
> > First of all, a bit of a mistake on my part. I have two cores set up and
> I
> > was changing the solrconfig.xml on the wrong core doh!!  That's why
> > highlighting wasn't being turned off.
> >
> > I think I've got the unified highlighter working.
> > storeOffsetsWithPositions was already configured on my field type
> > definition, not the field definition, so that was ok.
> >
> > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> > highlighting on some records and not others, making it confusing as to
> > where the match is with my dismax parser.  I increased
> > my hl.maxAnalyzedChars to 130 and now it's highlighting more records.
> > Two questions:
> >
> > 1. Have you any guidelines as to what could be a
> > maximum hl.maxAnalyzedChars without impacting performance or memory?
> >
>
> With storeOffsetsWithPositions, highlighting is super-fast, and so this
> hl.maxAnalyzedChars threshold is of marginal utility, like only to cap the
> amount of memory used if you have some truly humongous docs and it's okay
> only highlight the first X megabytes of them.  Maybe set to a 100MB worth
> of text, or something like that.
>
>
> > 2. Do you know a way to query the maximum length of text in a field so
> that
> > I can set hl.maxAnalyzedChars accordingly?  Just thinking I can probably
> > modify my java indexer to log the maximum content length.  Actually, I
> > probably don't want the maximum but some value that highlights 90-95%
> > records
> >
>
> Eh... not really.  Maybe some approximation hacks involving function
> queries on norms but I'd not bother in favor of just using a high threshold
> such that this won't be an issue.
>
> All this said, this threshold is *not* the only reason why you might not be
> getting highlights that you expect.  If you are using a recent Solr
> version, you might try toggling the hl.weightMatches boolean, which could
> make a difference for certain query arrangements.  There's a JIRA issue
> pertaining to this one, and I haven't investigated it yet.
>
> ~ David
>
>
> >
> > Thanks
> > Shaun
> >
> > On Tue, 12 Jan 2021 at 16:30, David Smiley  wrote:
> >
> > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi David
> > > >
> > > > First of all I wanted to say I'm working off your book!!  Third
> > edition,
> > > > and I think it's a bit out of date now. I was just going to try
> > following
> > > > the section on the Postings highlighter, but I see that's been
> absorbed
> > > > into the Unified highlighter. I find your book easier to follow than
> > the
> > > > official documentation though.
> > > >
> > >
> > > Thanks :-D.  I do maintain the Solr Reference Guide for the parts of
> > code I
> > > touch, including highlighting, so I hope what's there makes sense too.
> > >
> > >
> > > > I am going to try to configure the unified highlighter, and I will
> add
> > > that
> > > > storeOffsetsWithPositions to the schema (which I saw in your book)
> and
> > I
> > > > will try indexing again from scratch.  Was getting some funny things
> > > going
> > > > on where I thought I'd turned highlighting off and it was still
> giving
> > me
> > > > highlights.
> > > >
> > >
> > > hl=true/false
> > >
> > >
> > > > Actually just re-reading your email again, are you saying that you
> > can't
> > > > configure highlighting in solrconfig.xml? That's where I always
> > configure
> > > > original highlighting in my dismax search handler. Am I supposed to
> add
> > > > highlighting to each request?
> > > >
> > >
> > > You can set highlighting and other *parameters* in solrconfig.xml for
> > > request handlers.  But the dedicated  plugin info is only
> > for
> > > the original and Fast Vector Highlighters.
> > >
> > > ~ David
&g

Re: Highlighting large text fields

2021-01-12 Thread David Smiley
On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell 
wrote:

> Hi David
>
> Getting closer now.
>
> First of all, a bit of a mistake on my part. I have two cores set up and I
> was changing the solrconfig.xml on the wrong core doh!!  That's why
> highlighting wasn't being turned off.
>
> I think I've got the unified highlighter working.
> storeOffsetsWithPositions was already configured on my field type
> definition, not the field definition, so that was ok.
>
> What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> highlighting on some records and not others, making it confusing as to
> where the match is with my dismax parser.  I increased
> my hl.maxAnalyzedChars to 130 and now it's highlighting more records.
> Two questions:
>
> 1. Have you any guidelines as to what could be a
> maximum hl.maxAnalyzedChars without impacting performance or memory?
>

With storeOffsetsWithPositions, highlighting is super-fast, and so this
hl.maxAnalyzedChars threshold is of marginal utility, like only to cap the
amount of memory used if you have some truly humongous docs and it's okay
only highlight the first X megabytes of them.  Maybe set to a 100MB worth
of text, or something like that.


> 2. Do you know a way to query the maximum length of text in a field so that
> I can set hl.maxAnalyzedChars accordingly?  Just thinking I can probably
> modify my java indexer to log the maximum content length.  Actually, I
> probably don't want the maximum but some value that highlights 90-95%
> records
>

Eh... not really.  Maybe some approximation hacks involving function
queries on norms but I'd not bother in favor of just using a high threshold
such that this won't be an issue.

All this said, this threshold is *not* the only reason why you might not be
getting highlights that you expect.  If you are using a recent Solr
version, you might try toggling the hl.weightMatches boolean, which could
make a difference for certain query arrangements.  There's a JIRA issue
pertaining to this one, and I haven't investigated it yet.

~ David


>
> Thanks
> Shaun
>
> On Tue, 12 Jan 2021 at 16:30, David Smiley  wrote:
>
> > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell  >
> > wrote:
> >
> > > Hi David
> > >
> > > First of all I wanted to say I'm working off your book!!  Third
> edition,
> > > and I think it's a bit out of date now. I was just going to try
> following
> > > the section on the Postings highlighter, but I see that's been absorbed
> > > into the Unified highlighter. I find your book easier to follow than
> the
> > > official documentation though.
> > >
> >
> > Thanks :-D.  I do maintain the Solr Reference Guide for the parts of
> code I
> > touch, including highlighting, so I hope what's there makes sense too.
> >
> >
> > > I am going to try to configure the unified highlighter, and I will add
> > that
> > > storeOffsetsWithPositions to the schema (which I saw in your book) and
> I
> > > will try indexing again from scratch.  Was getting some funny things
> > going
> > > on where I thought I'd turned highlighting off and it was still giving
> me
> > > highlights.
> > >
> >
> > hl=true/false
> >
> >
> > > Actually just re-reading your email again, are you saying that you
> can't
> > > configure highlighting in solrconfig.xml? That's where I always
> configure
> > > original highlighting in my dismax search handler. Am I supposed to add
> > > highlighting to each request?
> > >
> >
> > You can set highlighting and other *parameters* in solrconfig.xml for
> > request handlers.  But the dedicated  plugin info is only
> for
> > the original and Fast Vector Highlighters.
> >
> > ~ David
> >
> >
> > >
> > > Thanks
> > > Shaun
> > >
> > > On Mon, 11 Jan 2021 at 20:57, David Smiley  wrote:
> > >
> > > > Hello!
> > > >
> > > > I worked on the UnifiedHighlighter a lot and want to help you!
> > > >
> > > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <
> > campbell.sh...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I've been using highlighting for a while, using the original
> > > highlighter,
> > > > > and just come across a problem with fields that contain a large
> > amount
> > > of
> > > > > text, approx 250k characters. I only have about 2,000 records but
> > each
> > > > one
> > > > > contains a journal publication to 

Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
Hi David

Getting closer now.

First of all, a bit of a mistake on my part. I have two cores set up and I
was changing the solrconfig.xml on the wrong core doh!!  That's why
highlighting wasn't being turned off.

I think I've got the unified highlighter working.
storeOffsetsWithPositions was already configured on my field type
definition, not the field definition, so that was ok.

What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
highlighting on some records and not others, making it confusing as to
where the match is with my dismax parser.  I increased
my hl.maxAnalyzedChars to 130 and now it's highlighting more records.
Two questions:

1. Have you any guidelines as to what could be a
maximum hl.maxAnalyzedChars without impacting performance or memory?

2. Do you know a way to query the maximum length of text in a field so that
I can set hl.maxAnalyzedChars accordingly?  Just thinking I can probably
modify my java indexer to log the maximum content length.  Actually, I
probably don't want the maximum but some value that highlights 90-95%
records

Thanks
Shaun

On Tue, 12 Jan 2021 at 16:30, David Smiley  wrote:

> On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell 
> wrote:
>
> > Hi David
> >
> > First of all I wanted to say I'm working off your book!!  Third edition,
> > and I think it's a bit out of date now. I was just going to try following
> > the section on the Postings highlighter, but I see that's been absorbed
> > into the Unified highlighter. I find your book easier to follow than the
> > official documentation though.
> >
>
> Thanks :-D.  I do maintain the Solr Reference Guide for the parts of code I
> touch, including highlighting, so I hope what's there makes sense too.
>
>
> > I am going to try to configure the unified highlighter, and I will add
> that
> > storeOffsetsWithPositions to the schema (which I saw in your book) and I
> > will try indexing again from scratch.  Was getting some funny things
> going
> > on where I thought I'd turned highlighting off and it was still giving me
> > highlights.
> >
>
> hl=true/false
>
>
> > Actually just re-reading your email again, are you saying that you can't
> > configure highlighting in solrconfig.xml? That's where I always configure
> > original highlighting in my dismax search handler. Am I supposed to add
> > highlighting to each request?
> >
>
> You can set highlighting and other *parameters* in solrconfig.xml for
> request handlers.  But the dedicated  plugin info is only for
> the original and Fast Vector Highlighters.
>
> ~ David
>
>
> >
> > Thanks
> > Shaun
> >
> > On Mon, 11 Jan 2021 at 20:57, David Smiley  wrote:
> >
> > > Hello!
> > >
> > > I worked on the UnifiedHighlighter a lot and want to help you!
> > >
> > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > I've been using highlighting for a while, using the original
> > highlighter,
> > > > and just come across a problem with fields that contain a large
> amount
> > of
> > > > text, approx 250k characters. I only have about 2,000 records but
> each
> > > one
> > > > contains a journal publication to search through.
> > > >
> > > > What I noticed is that some records didn't return a highlight even
> > though
> > > > they matched on the content. I noticed the hl.maxAnalyzedChars
> > parameter
> > > > and increased that, but  it allowed some records to be highlighted,
> but
> > > not
> > > > all, and then it caused memory problems on the server.  Performance
> is
> > > also
> > > > very poor.
> > > >
> > >
> > > I've been thinking hl.maxAnalyzedChars should maybe default to no limit
> > --
> > > it's a performance threshold but perhaps better to opt-in to such a
> limit
> > > then scratch your head for a long time wondering why a search result
> > isn't
> > > showing highlights.
> > >
> > >
> > > > To try to fix this I've tried  to configure the unified highlighter
> in
> > my
> > > > solrconfig.xml instead.   It seems to be working but again I'm
> missing
> > > some
> > > > highlighted records.
> > > >
> > >
> > > There is no configuration of that highlighter in solrconfig.xml; it's
> > > entirely parameter driven (runtime).
> > >
> > >
> > > > The other thing is I've tried to adjust my unified highlighting
> > set

Re: Highlighting large text fields

2021-01-12 Thread David Smiley
On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell 
wrote:

> Hi David
>
> First of all I wanted to say I'm working off your book!!  Third edition,
> and I think it's a bit out of date now. I was just going to try following
> the section on the Postings highlighter, but I see that's been absorbed
> into the Unified highlighter. I find your book easier to follow than the
> official documentation though.
>

Thanks :-D.  I do maintain the Solr Reference Guide for the parts of code I
touch, including highlighting, so I hope what's there makes sense too.


> I am going to try to configure the unified highlighter, and I will add that
> storeOffsetsWithPositions to the schema (which I saw in your book) and I
> will try indexing again from scratch.  Was getting some funny things going
> on where I thought I'd turned highlighting off and it was still giving me
> highlights.
>

hl=true/false


> Actually just re-reading your email again, are you saying that you can't
> configure highlighting in solrconfig.xml? That's where I always configure
> original highlighting in my dismax search handler. Am I supposed to add
> highlighting to each request?
>

You can set highlighting and other *parameters* in solrconfig.xml for
request handlers.  But the dedicated  plugin info is only for
the original and Fast Vector Highlighters.

~ David


>
> Thanks
> Shaun
>
> On Mon, 11 Jan 2021 at 20:57, David Smiley  wrote:
>
> > Hello!
> >
> > I worked on the UnifiedHighlighter a lot and want to help you!
> >
> > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell  >
> > wrote:
> >
> > > I've been using highlighting for a while, using the original
> highlighter,
> > > and just come across a problem with fields that contain a large amount
> of
> > > text, approx 250k characters. I only have about 2,000 records but each
> > one
> > > contains a journal publication to search through.
> > >
> > > What I noticed is that some records didn't return a highlight even
> though
> > > they matched on the content. I noticed the hl.maxAnalyzedChars
> parameter
> > > and increased that, but  it allowed some records to be highlighted, but
> > not
> > > all, and then it caused memory problems on the server.  Performance is
> > also
> > > very poor.
> > >
> >
> > I've been thinking hl.maxAnalyzedChars should maybe default to no limit
> --
> > it's a performance threshold but perhaps better to opt-in to such a limit
> > then scratch your head for a long time wondering why a search result
> isn't
> > showing highlights.
> >
> >
> > > To try to fix this I've tried  to configure the unified highlighter in
> my
> > > solrconfig.xml instead.   It seems to be working but again I'm missing
> > some
> > > highlighted records.
> > >
> >
> > There is no configuration of that highlighter in solrconfig.xml; it's
> > entirely parameter driven (runtime).
> >
> >
> > > The other thing is I've tried to adjust my unified highlighting
> settings
> > in
> > > solrconfig.xml and they don't  seem to be having any effect even after
> > > restarting Solr.  I was just wondering whether there is any
> highlighting
> > > information stored at index time. It's taking over 4hours to index my
> > > records so it's not easy to keep reindexing my content.
> > >
> > > Any ideas on how to handle highlighting of large content  would be
> > > appreciated.
> > >
> > > Shaun
> > >
> >
> > Please read the documentation here thoroughly:
> >
> >
> https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
> > (or earlier version as applicable)
> > Since you have large bodies of text to highlight, you would strongly
> > benefit from putting offsets into the search index (and re-index) --
> > storeOffsetsWithPositions.  That's an option on the field/fieldType in
> your
> > schema; it may not be obvious reading the docs.  You have to opt-in to
> > that; Solr doesn't normally store any info in the index for highlighting.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
>


Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
Hi David

First of all I wanted to say I'm working off your book!!  Third edition,
and I think it's a bit out of date now. I was just going to try following
the section on the Postings highlighter, but I see that's been absorbed
into the Unified highlighter. I find your book easier to follow than the
official documentation though.

I am going to try to configure the unified highlighter, and I will add that
storeOffsetsWithPositions to the schema (which I saw in your book) and I
will try indexing again from scratch.  Was getting some funny things going
on where I thought I'd turned highlighting off and it was still giving me
highlights.

Actually just re-reading your email again, are you saying that you can't
configure highlighting in solrconfig.xml? That's where I always configure
original highlighting in my dismax search handler. Am I supposed to add
highlighting to each request?

Thanks
Shaun

On Mon, 11 Jan 2021 at 20:57, David Smiley  wrote:

> Hello!
>
> I worked on the UnifiedHighlighter a lot and want to help you!
>
> On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell 
> wrote:
>
> > I've been using highlighting for a while, using the original highlighter,
> > and just come across a problem with fields that contain a large amount of
> > text, approx 250k characters. I only have about 2,000 records but each
> one
> > contains a journal publication to search through.
> >
> > What I noticed is that some records didn't return a highlight even though
> > they matched on the content. I noticed the hl.maxAnalyzedChars parameter
> > and increased that, but  it allowed some records to be highlighted, but
> not
> > all, and then it caused memory problems on the server.  Performance is
> also
> > very poor.
> >
>
> I've been thinking hl.maxAnalyzedChars should maybe default to no limit --
> it's a performance threshold but perhaps better to opt-in to such a limit
> then scratch your head for a long time wondering why a search result isn't
> showing highlights.
>
>
> > To try to fix this I've tried  to configure the unified highlighter in my
> > solrconfig.xml instead.   It seems to be working but again I'm missing
> some
> > highlighted records.
> >
>
> There is no configuration of that highlighter in solrconfig.xml; it's
> entirely parameter driven (runtime).
>
>
> > The other thing is I've tried to adjust my unified highlighting settings
> in
> > solrconfig.xml and they don't  seem to be having any effect even after
> > restarting Solr.  I was just wondering whether there is any highlighting
> > information stored at index time. It's taking over 4hours to index my
> > records so it's not easy to keep reindexing my content.
> >
> > Any ideas on how to handle highlighting of large content  would be
> > appreciated.
> >
> > Shaun
> >
>
> Please read the documentation here thoroughly:
>
> https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
> (or earlier version as applicable)
> Since you have large bodies of text to highlight, you would strongly
> benefit from putting offsets into the search index (and re-index) --
> storeOffsetsWithPositions.  That's an option on the field/fieldType in your
> schema; it may not be obvious reading the docs.  You have to opt-in to
> that; Solr doesn't normally store any info in the index for highlighting.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


Re: Highlighting large text fields

2021-01-11 Thread David Smiley
Hello!

I worked on the UnifiedHighlighter a lot and want to help you!

On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell 
wrote:

> I've been using highlighting for a while, using the original highlighter,
> and just come across a problem with fields that contain a large amount of
> text, approx 250k characters. I only have about 2,000 records but each one
> contains a journal publication to search through.
>
> What I noticed is that some records didn't return a highlight even though
> they matched on the content. I noticed the hl.maxAnalyzedChars parameter
> and increased that, but  it allowed some records to be highlighted, but not
> all, and then it caused memory problems on the server.  Performance is also
> very poor.
>

I've been thinking hl.maxAnalyzedChars should maybe default to no limit --
it's a performance threshold but perhaps better to opt-in to such a limit
then scratch your head for a long time wondering why a search result isn't
showing highlights.


> To try to fix this I've tried  to configure the unified highlighter in my
> solrconfig.xml instead.   It seems to be working but again I'm missing some
> highlighted records.
>

There is no configuration of that highlighter in solrconfig.xml; it's
entirely parameter driven (runtime).


> The other thing is I've tried to adjust my unified highlighting settings in
> solrconfig.xml and they don't  seem to be having any effect even after
> restarting Solr.  I was just wondering whether there is any highlighting
> information stored at index time. It's taking over 4hours to index my
> records so it's not easy to keep reindexing my content.
>
> Any ideas on how to handle highlighting of large content  would be
> appreciated.
>
> Shaun
>

Please read the documentation here thoroughly:
https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
(or earlier version as applicable)
Since you have large bodies of text to highlight, you would strongly
benefit from putting offsets into the search index (and re-index) --
storeOffsetsWithPositions.  That's an option on the field/fieldType in your
schema; it may not be obvious reading the docs.  You have to opt-in to
that; Solr doesn't normally store any info in the index for highlighting.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Highlighting large text fields

2021-01-11 Thread Shaun Campbell
I've been using highlighting for a while, using the original highlighter,
and just come across a problem with fields that contain a large amount of
text, approx 250k characters. I only have about 2,000 records but each one
contains a journal publication to search through.

What I noticed is that some records didn't return a highlight even though
they matched on the content. I noticed the hl.maxAnalyzedChars parameter
and increased that, but  it allowed some records to be highlighted, but not
all, and then it caused memory problems on the server.  Performance is also
very poor.

To try to fix this I've tried  to configure the unified highlighter in my
solrconfig.xml instead.   It seems to be working but again I'm missing some
highlighted records.

The other thing is I've tried to adjust my unified highlighting settings in
solrconfig.xml and they don't  seem to be having any effect even after
restarting Solr.  I was just wondering whether there is any highlighting
information stored at index time. It's taking over 4hours to index my
records so it's not easy to keep reindexing my content.

Any ideas on how to handle highlighting of large content  would be
appreciated.

Shaun


Re: Solr Highlighting not working

2020-11-30 Thread Ajay Sharma
Hi All,

pushing the query to the top.
Does anyone have any idea about it?


On Fri, Nov 27, 2020 at 11:49 AM Ajay Sharma  wrote:

> Hi Community,
>
> This is the first time, I am implementing a solr *highlighting *feature.
> I have read the concept via solr documentation
> Link- https://lucene.apache.org/solr/guide/8_2/highlighting.html
>
> To enable highlighting I just have to add *=true=* *in our solr
> query and got the snippet in the solr response and it is working fine in
> most of the cases.
>
> *But highlighting does not work when synonyms came into action*
>
> *Issue:*
> I am searching leopard (q=leopard) in field title (qf=title)
>
> In our synonym file, we have an entry like below
> *leopard,tenduaa,panther*
>
> and in one document id:123456, field title contains below text:
> title:"Jindal Panther TMT Bars
>
> For the query (q=leopard) , i am getting this document (id:123456) in solr
> response
> I could check that due to synonym document is matched  and I confirmed it
> via Solr UI analysis screen where I put Analyse FieldName= title,  Field
> Value (Index) ="Jindal Panther TMT rebars" and Field Value (Query) =
> leopard and I could see in index chain, token panther getting saved as
> leopard also but in highlighting I don't get any matched token and
> getting below response
>
>
>- highlighting:
>{
>   - 123456: { }
>   }
>
>
>
> I just need the matched synonym token like panther in the above case to be
> returned in solr highlighting response
> I have read and re-read the solr documentation, searched on google gone
> through many articles even checked StackOverflow but could not find a
> solution.
> Any help from community members will be highly appreciated.
>
> Thanks in advance.
>
>
> --
> Regards,
> Ajay Sharma
> Software Engineer, Product-Search,
> IndiaMART InterMESH Ltd
>


-- 
Thanks & Regards,
Ajay Sharma
Software Engineer, Product-Search,
IndiaMART InterMESH Ltd

-- 



Solr Highlighting not working

2020-11-26 Thread Ajay Sharma
Hi Community,

This is the first time, I am implementing a solr *highlighting *feature.
I have read the concept via solr documentation
Link- https://lucene.apache.org/solr/guide/8_2/highlighting.html

To enable highlighting I just have to add *=true=* *in our solr
query and got the snippet in the solr response and it is working fine in
most of the cases.

*But highlighting does not work when synonyms came into action*

*Issue:*
I am searching leopard (q=leopard) in field title (qf=title)

In our synonym file, we have an entry like below
*leopard,tenduaa,panther*

and in one document id:123456, field title contains below text:
title:"Jindal Panther TMT Bars

For the query (q=leopard) , i am getting this document (id:123456) in solr
response
I could check that due to synonym document is matched  and I confirmed it
via Solr UI analysis screen where I put Analyse FieldName= title,  Field
Value (Index) ="Jindal Panther TMT rebars" and Field Value (Query) =
leopard and I could see in index chain, token panther getting saved as
leopard also but in highlighting I don't get any matched token and getting
below response


   - highlighting:
   {
  - 123456: { }
  }



I just need the matched synonym token like panther in the above case to be
returned in solr highlighting response
I have read and re-read the solr documentation, searched on google gone
through many articles even checked StackOverflow but could not find a
solution.
Any help from community members will be highly appreciated.

Thanks in advance.


-- 
Regards,
Ajay Sharma
Software Engineer, Product-Search,
IndiaMART InterMESH Ltd

-- 



Re: Highlighting values of non stored fields

2020-06-14 Thread mosh bla



Thanks Erick, indeed that was my problem and you helped me understand how hl 
component works, but still I cant understand how can I avoid storing all 
field’s variations? For example, if I need to support morphological search, I 
have 2 fields:



Say we indexed the following doc:
{
    “doc_text”: “walking dead”
}
Following queries should match:
q = walking
q = walk
I am issuing edismax query with qf="doc_text^2 doc_text_morph” (boosts are 
currently missing) and add highlight params. ‘walk’ will be matched on 
doc_text_morph, but will only be highlighted iff doc_text_morph is stored (no 
match on stored field doc_text...). Is there any way to make it highlighted 
without also storing doc_text_morph field?
Thanks again...
 
 
 
 

Sent: Monday, June 08, 2020 at 3:39 PM
From: "Erick Erickson" 
To: solr-user@lucene.apache.org
Subject: Re: Highlighting values of non stored fields
When highlighting, the stored data for the field is re-analyzed against the 
query based on the field you’re highlighting. My bet is that if you query just 
“q=doc_text:mosh” you will not get a hit. Check your text_ws fieldType, it’s 
probably case sensitive. So if you changed the doc_text type to text_general 
(the same as your dynamic field), I think you’d be fine. re-index your data of 
course….

I’ll add by-the-by that text_ws is a fairly restricted, and is rarely useful 
for searching on anything humans have to key in. It’ll include punctuation for 
instance, i.e. input like “dog dog.” will produce two tokens, one with a period 
in the token and one without. It’s most useful for heavily-preprocessed data 
where the app normalizes the input or machine-generated input.

There’s no reason, BTW, to index your doc_text for highlighting purposes since 
the stored data is what counts. Unless, of course, you want to search on that 
field specifically.

Best,
Erick

> On Jun 7, 2020, at 11:32 PM, mosh bla  wrote:
>
>
> Thanks Erick for the reply. Your answer is eaxctly what I was expecting from 
> the highlight component but it seems like I am getting different behaviour.
> I'll try to give a simple example and I hope you can explain where is my 
> mistake.
> Say I have the following fields configuration:
> 
> 
> 
>
> And I indexed the following document:
> {
> "doc_text": "MOSH"
> }
>
> When executing the following query 
> "http://.../select?q=doc_text_lw:mosh=true=doc_text; - the document 
> is matched and returned in response, but the highlighed fragment is empty.
> I also tried to change 'hl.method' param to 'unified' and 'fastVector' but no 
> luck either. My conclusion was that 'hl.fl' param should be set to 
> 'doc_text_lw' and it must be also stored...
>
>
>
>
> Sent: Tuesday, June 02, 2020 at 3:15 PM
> From: "Erick Erickson" 
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting values of non stored fields
> Why do you think even variants need to be stored/highlighted? Usually
> when you store variants for ranking purposes those extra copies are
> invisible to the user. So most often people store exactly one copy
> of a particular field and highlight _that_ field in the return.
>
> So say my field is f1 and I have indexed f1_1, f1_2, f1_3. I just store
> f1_1 and return the highlighted text from that one.
>
> You could even just stored the data only once in a field that’s never
> indexed and return/highlight that if you wanted.
>
> Best,
> Erick
>
>> On Jun 2, 2020, at 3:24 AM, mosheB  wrote:
>>
>> Our use case is as follow:
>> We are indexing free text documents. Each document contains metadata fields
>> (such as author, creation date...) which are kinda small, and one "big"
>> field that holds the document's text itself.
>>
>> For ranking purpose each field is indexed in more then one "variation" and
>> query is executed with edismax query parser. Things are working alright, but
>> now a new feature is requested by the customer - highlighting.
>> To enable highlighting every field must be stored, including all variations
>> of the big text field. This pushes our storage to the limit (and probably
>> the document cache...) and feels a bit redundant, as the stored value is
>> duplicated n times... Is there any way to “reference” stored value from one
>> field to another?
>> For example:
>> Say we have the following config:
>> > />
>> > />
>>
>> 
>> 
>> 
>>
>> And we execute the following query:
>> http://.../select?defType=edismax=desired_terms=doc_text^2
>> doc_text_bigrams^3
>> doc_text_phrases^4=on=doc_text,doc_text_bigrams,doc_text_phrases
>>
>> Highlight fragments in response will be blank if match occurred on the
>> non-stored fields (doc_text_bigrams or doc_text_phrases). Is it possible to
>> pass extra parameter to the highlight component, to point it to the stored
>> data of the “original” doc_text field? a kind of “stored value reference
>> field”?
>>
>> Thanks in advance.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
 


Re: Highlighting values of non stored fields

2020-06-08 Thread Erick Erickson
When highlighting, the stored data for the field is re-analyzed against the 
query based on the field you’re highlighting. My bet is that if you query just 
“q=doc_text:mosh” you will not get a hit. Check your text_ws fieldType, it’s 
probably case sensitive. So if you changed the doc_text type to text_general 
(the same as your dynamic field), I think you’d be fine. re-index your data of 
course….

I’ll add by-the-by that text_ws is a fairly restricted, and is rarely useful 
for searching on anything humans have to key in. It’ll include punctuation for 
instance, i.e. input like “dog dog.” will produce two tokens, one with a period 
in the token and one without. It’s most useful for heavily-preprocessed data 
where the app normalizes the input or machine-generated input.

There’s no reason, BTW, to index your doc_text for highlighting purposes since 
the stored data is what counts. Unless, of course, you want to search on that 
field specifically.

Best,
Erick

> On Jun 7, 2020, at 11:32 PM, mosh bla  wrote:
> 
> 
> Thanks Erick for the reply. Your answer is eaxctly what I was expecting from 
> the highlight component but it seems like I am getting different behaviour.
> I'll try to give a simple example and I hope you can explain where is my 
> mistake.
> Say I have the following fields configuration:
> 
> 
> 
> 
> And I indexed the following document:
> {
>"doc_text": "MOSH"
> }
> 
> When executing the following query 
> "http://.../select?q=doc_text_lw:mosh=true=doc_text; - the document 
> is matched and returned in response, but the highlighed fragment is empty.
> I also tried to change 'hl.method' param to 'unified' and 'fastVector' but no 
> luck either. My conclusion was that 'hl.fl' param should be set to 
> 'doc_text_lw' and it must be also stored...
>  
>  
>  
> 
> Sent: Tuesday, June 02, 2020 at 3:15 PM
> From: "Erick Erickson" 
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting values of non stored fields
> Why do you think even variants need to be stored/highlighted? Usually
> when you store variants for ranking purposes those extra copies are
> invisible to the user. So most often people store exactly one copy
> of a particular field and highlight _that_ field in the return.
> 
> So say my field is f1 and I have indexed f1_1, f1_2, f1_3. I just store
> f1_1 and return the highlighted text from that one.
> 
> You could even just stored the data only once in a field that’s never
> indexed and return/highlight that if you wanted.
> 
> Best,
> Erick
> 
>> On Jun 2, 2020, at 3:24 AM, mosheB  wrote:
>> 
>> Our use case is as follow:
>> We are indexing free text documents. Each document contains metadata fields
>> (such as author, creation date...) which are kinda small, and one "big"
>> field that holds the document's text itself.
>> 
>> For ranking purpose each field is indexed in more then one "variation" and
>> query is executed with edismax query parser. Things are working alright, but
>> now a new feature is requested by the customer - highlighting.
>> To enable highlighting every field must be stored, including all variations
>> of the big text field. This pushes our storage to the limit (and probably
>> the document cache...) and feels a bit redundant, as the stored value is
>> duplicated n times... Is there any way to “reference” stored value from one
>> field to another?
>> For example:
>> Say we have the following config:
>> > />
>> > />
>> 
>> 
>> 
>> 
>> 
>> And we execute the following query:
>> http://.../select?defType=edismax=desired_terms=doc_text^2
>> doc_text_bigrams^3
>> doc_text_phrases^4=on=doc_text,doc_text_bigrams,doc_text_phrases
>> 
>> Highlight fragments in response will be blank if match occurred on the
>> non-stored fields (doc_text_bigrams or doc_text_phrases). Is it possible to
>> pass extra parameter to the highlight component, to point it to the stored
>> data of the “original” doc_text field? a kind of “stored value reference
>> field”?
>> 
>> Thanks in advance.
>> 
>> 
>> 
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>  



Re: Highlighting values of non stored fields

2020-06-07 Thread mosh bla


Thanks Erick for the reply. Your answer is eaxctly what I was expecting from 
the highlight component but it seems like I am getting different behaviour.
I'll try to give a simple example and I hope you can explain where is my 
mistake.
Say I have the following fields configuration:



 
And I indexed the following document:
{
"doc_text": "MOSH"
}
 
When executing the following query 
"http://.../select?q=doc_text_lw:mosh=true=doc_text; - the document is 
matched and returned in response, but the highlighed fragment is empty.
I also tried to change 'hl.method' param to 'unified' and 'fastVector' but no 
luck either. My conclusion was that 'hl.fl' param should be set to 
'doc_text_lw' and it must be also stored...
 
 
 

Sent: Tuesday, June 02, 2020 at 3:15 PM
From: "Erick Erickson" 
To: solr-user@lucene.apache.org
Subject: Re: Highlighting values of non stored fields
Why do you think even variants need to be stored/highlighted? Usually
when you store variants for ranking purposes those extra copies are
invisible to the user. So most often people store exactly one copy
of a particular field and highlight _that_ field in the return.

So say my field is f1 and I have indexed f1_1, f1_2, f1_3. I just store
f1_1 and return the highlighted text from that one.

You could even just stored the data only once in a field that’s never
indexed and return/highlight that if you wanted.

Best,
Erick

> On Jun 2, 2020, at 3:24 AM, mosheB  wrote:
>
> Our use case is as follow:
> We are indexing free text documents. Each document contains metadata fields
> (such as author, creation date...) which are kinda small, and one "big"
> field that holds the document's text itself.
>
> For ranking purpose each field is indexed in more then one "variation" and
> query is executed with edismax query parser. Things are working alright, but
> now a new feature is requested by the customer - highlighting.
> To enable highlighting every field must be stored, including all variations
> of the big text field. This pushes our storage to the limit (and probably
> the document cache...) and feels a bit redundant, as the stored value is
> duplicated n times... Is there any way to “reference” stored value from one
> field to another?
> For example:
> Say we have the following config:
>  />
>  />
>
> 
> 
> 
>
> And we execute the following query:
> http://.../select?defType=edismax=desired_terms=doc_text^2
> doc_text_bigrams^3
> doc_text_phrases^4=on=doc_text,doc_text_bigrams,doc_text_phrases
>
> Highlight fragments in response will be blank if match occurred on the
> non-stored fields (doc_text_bigrams or doc_text_phrases). Is it possible to
> pass extra parameter to the highlight component, to point it to the stored
> data of the “original” doc_text field? a kind of “stored value reference
> field”?
>
> Thanks in advance.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
 


Re: Highlighting values of non stored fields

2020-06-02 Thread Erick Erickson
Why do you think even variants need to be stored/highlighted? Usually
when you store variants for ranking purposes those extra copies are
invisible to the user. So most often people store exactly one copy
of a particular field and highlight _that_ field in the return.

So say my field is f1 and I have indexed f1_1, f1_2, f1_3. I just store
f1_1 and return the highlighted text from that one.

You could even just stored the data only once in a field that’s never
indexed and return/highlight that if you wanted. 

Best,
Erick

> On Jun 2, 2020, at 3:24 AM, mosheB  wrote:
> 
> Our use case is as follow:
> We are indexing free text documents. Each document contains metadata fields
> (such as author, creation date...) which are kinda small, and one "big"
> field that holds the document's text itself.
> 
> For ranking purpose each field is indexed in more then one "variation" and
> query is executed with edismax query parser. Things are working alright, but
> now a new feature is requested by the customer - highlighting.
> To enable highlighting every field must be stored, including all variations
> of the big text field. This pushes our storage to the limit (and probably
> the document cache...) and feels a bit redundant, as the stored value is
> duplicated n times... Is there any way to “reference” stored value from one
> field to another?
> For example:
> Say we have the following config:
>  />
>  />
> 
> 
> 
> 
> 
> And we execute the following query:
> http://.../select?defType=edismax=desired_terms=doc_text^2
> doc_text_bigrams^3
> doc_text_phrases^4=on=doc_text,doc_text_bigrams,doc_text_phrases
> 
> Highlight fragments in response will be blank if match occurred  on the
> non-stored fields (doc_text_bigrams or doc_text_phrases). Is it possible to
> pass extra parameter to the highlight component, to point it to the stored
> data of the “original” doc_text field? a kind of “stored value reference
> field”?
> 
> Thanks in advance.
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Highlighting values of non stored fields

2020-06-02 Thread mosheB
Our use case is as follow:
We are indexing free text documents. Each document contains metadata fields
(such as author, creation date...) which are kinda small, and one "big"
field that holds the document's text itself.

For ranking purpose each field is indexed in more then one "variation" and
query is executed with edismax query parser. Things are working alright, but
now a new feature is requested by the customer - highlighting.
To enable highlighting every field must be stored, including all variations
of the big text field. This pushes our storage to the limit (and probably
the document cache...) and feels a bit redundant, as the stored value is
duplicated n times... Is there any way to “reference” stored value from one
field to another?
For example:
Say we have the following config:







And we execute the following query:
http://.../select?defType=edismax=desired_terms=doc_text^2
doc_text_bigrams^3
doc_text_phrases^4=on=doc_text,doc_text_bigrams,doc_text_phrases

Highlight fragments in response will be blank if match occurred  on the
non-stored fields (doc_text_bigrams or doc_text_phrases). Is it possible to
pass extra parameter to the highlight component, to point it to the stored
data of the “original” doc_text field? a kind of “stored value reference
field”?

Thanks in advance.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Jörn Franke
Hmm maybe more insights on the use case would be useful. It looks like what 
David says about metadata could make sense in your scenario depending on the 
requirements...



> Am 24.05.2020 um 13:20 schrieb Serkan KAZANCI :
> 
> Thanks Jörn for the answer,
> 
> I use post tool to index html documents, so the html tags are stripped when 
> indexed and stored. The remaining text is mapped to the field content by 
> default. 
> 
> hl.fragsize=0 works perfect for the indexed document, but I can only display 
> highlighted text-only version of html document because the html tags are 
> stripped.
> 
> So is it possible to index and store the html document without stripping the 
> html tags, so that when the document is displayed with hl.fragsize=0 
> parameter, it is displayed as original html document?
> 
> Or
> 
> Is it possible to give a whole html document as a parameter to the Unified 
> highlighter so that output is also a highlighted html document?
> 
> Or 
> 
> Do you have a better idea to highlight the keywords of the whole html 
> document? 
> 
> Thanks,
> 
> Serkan
> 
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com] 
> Sent: Sunday, May 24, 2020 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting a whole html document using Unified highlighter
> 
> hl.fragsize=0
> 
> https://lucene.apache.org/solr/guide/8_5/highlighting.html
> 
> 
> 
>> Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
>> 
>> Hi,
>> 
>> 
>> 
>> I use solr to search over a million html documents, when a document is
>> searched and displayed, I want to highlight the keywords that are used to
>> find and access the document.
>> 
>> 
>> 
>> Unified highlighter is fast, accurate and supports different languages but
>> only highlights passages with given parameters.
>> 
>> 
>> 
>> How can I highlight a whole html document using Unified highlighter? I have
>> written a php code but it cannot do the complex word stemming functions.
>> 
>> 
>> 
>> 
>> 
>> Thanks,
>> 
>> 
>> 
>> Serkan
>> 
> 


Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Serkan KAZANCI
All clear. 

Thanks David,

> On 24 May 2020, at 18:57, David Smiley  wrote:
> 
> These strategies are not mutually exclusive.  Yes I do suggest having the
> HTML in whole go into one searchable field to satisfy your highlighting
> use-case.  But I can imagine you will also want some document metadata in
> separate fields.  It's up to you to parse that out somehow and add it.  You
> mentioned you are using bin/post but, IMO, that capability is more for
> quick experimentation / tutorials, some POCs, or very simple use-cases.  I
> doubt you can do what I suggest while still using bin/post.  You might be
> able to use "SolrCell" AKA ExtractingRequestHandler directly, which is what
> bin/post does with HTML.
> 
> Good luck!
> 
> ~ David
> 
> 
>> On Sun, May 24, 2020 at 10:52 AM Serkan KAZANCI 
>> wrote:
>> 
>> Hi David,
>> 
>> I have many meta-tags in html documents like  > content="2019-10-15T23:59:59Z"> which matches the field descriptions in
>> schema file.
>> 
>> As I understand, you propose to index the whole html document as one text
>> file and map it to a search field (do you?) . That would take care of the
>> html highlight issue, however I would lose the field information coming
>> from meta-tags .
>> 
>> So is it possible to index the html document as html document ?
>> (preserving the field data coming from meta-tags and not strip the html
>> tags)
>> 
>> Then I could use solr.HTMLStripCharFilterFactory for analysis.
>> 
>> Thank You,
>> 
>> Serkan,
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: David Smiley [mailto:dsmi...@apache.org]
>> Sent: Sunday, May 24, 2020 5:26 PM
>> To: solr-user
>> Subject: Re: highlighting a whole html document using Unified highlighter
>> 
>> Instead of stripping the HTML for the stored value, leave it be and remove
>> it during the analysis stage with solr.HTMLStripCharFilterFactory
>> <
>> https://builds.apache.org/job/Solr-reference-guide-master/javadoc/charfilterfactories.html#solr-htmlstripcharfilterfactory
>>> 
>> This means the searchable text will only be the visible text, basically.
>> And the highlighter will only highlight what's searchable.
>> 
>> I suggest doing some experimentation for searching for words that you know
>> are directly adjacent (no spaces) to opening and closing tags to make sure
>> that the inserted HTML markup for the highlight balance correctly.  Use a
>> "phrase query" (quoted) as well, and see if you can highlight around markup
>> like "phrasequery" to see what happens.  You might need to set
>> hl.weightMatches=false to ensure the words separately are highlighted.  I
>> suspect you will find there is a problem, and the root cause is here:
>> LUCENE-5734 <https://issues.apache.org/jira/browse/LUCENE-5734>   It's on
>> my long TODO list but hasn't bitten me lately so I've neglected it.
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>> 
>> 
>> On Sun, May 24, 2020 at 7:20 AM Serkan KAZANCI 
>> wrote:
>> 
>>> Thanks Jörn for the answer,
>>> 
>>> I use post tool to index html documents, so the html tags are stripped
>>> when indexed and stored. The remaining text is mapped to the field
>> content
>>> by default.
>>> 
>>> hl.fragsize=0 works perfect for the indexed document, but I can only
>>> display highlighted text-only version of html document because the html
>>> tags are stripped.
>>> 
>>> So is it possible to index and store the html document without stripping
>>> the html tags, so that when the document is displayed with hl.fragsize=0
>>> parameter, it is displayed as original html document?
>>> 
>>> Or
>>> 
>>> Is it possible to give a whole html document as a parameter to the
>> Unified
>>> highlighter so that output is also a highlighted html document?
>>> 
>>> Or
>>> 
>>> Do you have a better idea to highlight the keywords of the whole html
>>> document?
>>> 
>>> Thanks,
>>> 
>>> Serkan
>>> 
>>> -Original Message-
>>> From: Jörn Franke [mailto:jornfra...@gmail.com]
>>> Sent: Sunday, May 24, 2020 1:22 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: highlighting a whole html document using Unified highlighter
>>> 
>>> hl.fragsize=0
>>> 
>>> https://lucene.apache.org/solr/guide/8_5/highlighting.html
>>> 
>>> 
>>> 
>>>> Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
>>>> 
>>>> Hi,
>>>> 
>>>> 
>>>> 
>>>> I use solr to search over a million html documents, when a document is
>>>> searched and displayed, I want to highlight the keywords that are used
>> to
>>>> find and access the document.
>>>> 
>>>> 
>>>> 
>>>> Unified highlighter is fast, accurate and supports different languages
>>> but
>>>> only highlights passages with given parameters.
>>>> 
>>>> 
>>>> 
>>>> How can I highlight a whole html document using Unified highlighter? I
>>> have
>>>> written a php code but it cannot do the complex word stemming
>> functions.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> 
>>>> 
>>>> Serkan
>>>> 
>>> 
>>> 
>> 
>> 



Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread David Smiley
These strategies are not mutually exclusive.  Yes I do suggest having the
HTML in whole go into one searchable field to satisfy your highlighting
use-case.  But I can imagine you will also want some document metadata in
separate fields.  It's up to you to parse that out somehow and add it.  You
mentioned you are using bin/post but, IMO, that capability is more for
quick experimentation / tutorials, some POCs, or very simple use-cases.  I
doubt you can do what I suggest while still using bin/post.  You might be
able to use "SolrCell" AKA ExtractingRequestHandler directly, which is what
bin/post does with HTML.

Good luck!

~ David


On Sun, May 24, 2020 at 10:52 AM Serkan KAZANCI 
wrote:

> Hi David,
>
> I have many meta-tags in html documents like   content="2019-10-15T23:59:59Z"> which matches the field descriptions in
> schema file.
>
> As I understand, you propose to index the whole html document as one text
> file and map it to a search field (do you?) . That would take care of the
> html highlight issue, however I would lose the field information coming
> from meta-tags .
>
> So is it possible to index the html document as html document ?
> (preserving the field data coming from meta-tags and not strip the html
> tags)
>
> Then I could use solr.HTMLStripCharFilterFactory for analysis.
>
> Thank You,
>
> Serkan,
>
>
>
>
> -Original Message-
> From: David Smiley [mailto:dsmi...@apache.org]
> Sent: Sunday, May 24, 2020 5:26 PM
> To: solr-user
> Subject: Re: highlighting a whole html document using Unified highlighter
>
> Instead of stripping the HTML for the stored value, leave it be and remove
> it during the analysis stage with solr.HTMLStripCharFilterFactory
> <
> https://builds.apache.org/job/Solr-reference-guide-master/javadoc/charfilterfactories.html#solr-htmlstripcharfilterfactory
> >
> This means the searchable text will only be the visible text, basically.
> And the highlighter will only highlight what's searchable.
>
> I suggest doing some experimentation for searching for words that you know
> are directly adjacent (no spaces) to opening and closing tags to make sure
> that the inserted HTML markup for the highlight balance correctly.  Use a
> "phrase query" (quoted) as well, and see if you can highlight around markup
> like "phrasequery" to see what happens.  You might need to set
> hl.weightMatches=false to ensure the words separately are highlighted.  I
> suspect you will find there is a problem, and the root cause is here:
> LUCENE-5734 <https://issues.apache.org/jira/browse/LUCENE-5734>   It's on
> my long TODO list but hasn't bitten me lately so I've neglected it.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, May 24, 2020 at 7:20 AM Serkan KAZANCI 
> wrote:
>
> > Thanks Jörn for the answer,
> >
> > I use post tool to index html documents, so the html tags are stripped
> > when indexed and stored. The remaining text is mapped to the field
> content
> > by default.
> >
> > hl.fragsize=0 works perfect for the indexed document, but I can only
> > display highlighted text-only version of html document because the html
> > tags are stripped.
> >
> > So is it possible to index and store the html document without stripping
> > the html tags, so that when the document is displayed with hl.fragsize=0
> > parameter, it is displayed as original html document?
> >
> > Or
> >
> > Is it possible to give a whole html document as a parameter to the
> Unified
> > highlighter so that output is also a highlighted html document?
> >
> > Or
> >
> > Do you have a better idea to highlight the keywords of the whole html
> > document?
> >
> >  Thanks,
> >
> >  Serkan
> >
> > -Original Message-
> > From: Jörn Franke [mailto:jornfra...@gmail.com]
> > Sent: Sunday, May 24, 2020 1:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: highlighting a whole html document using Unified highlighter
> >
> > hl.fragsize=0
> >
> > https://lucene.apache.org/solr/guide/8_5/highlighting.html
> >
> >
> >
> > > Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> > >
> > > Hi,
> > >
> > >
> > >
> > > I use solr to search over a million html documents, when a document is
> > > searched and displayed, I want to highlight the keywords that are used
> to
> > > find and access the document.
> > >
> > >
> > >
> > > Unified highlighter is fast, accurate and supports different languages
> > but
> > > only highlights passages with given parameters.
> > >
> > >
> > >
> > > How can I highlight a whole html document using Unified highlighter? I
> > have
> > > written a php code but it cannot do the complex word stemming
> functions.
> > >
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Serkan
> > >
> >
> >
>
>


RE: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Serkan KAZANCI
Hi David,

I have many meta-tags in html documents like   which matches the field descriptions in schema 
file.

As I understand, you propose to index the whole html document as one text file 
and map it to a search field (do you?) . That would take care of the html 
highlight issue, however I would lose the field information coming from 
meta-tags .

So is it possible to index the html document as html document ? (preserving the 
field data coming from meta-tags and not strip the html tags) 

Then I could use solr.HTMLStripCharFilterFactory for analysis.

Thank You,

Serkan,




-Original Message-
From: David Smiley [mailto:dsmi...@apache.org] 
Sent: Sunday, May 24, 2020 5:26 PM
To: solr-user
Subject: Re: highlighting a whole html document using Unified highlighter

Instead of stripping the HTML for the stored value, leave it be and remove
it during the analysis stage with solr.HTMLStripCharFilterFactory
<https://builds.apache.org/job/Solr-reference-guide-master/javadoc/charfilterfactories.html#solr-htmlstripcharfilterfactory>
This means the searchable text will only be the visible text, basically.
And the highlighter will only highlight what's searchable.

I suggest doing some experimentation for searching for words that you know
are directly adjacent (no spaces) to opening and closing tags to make sure
that the inserted HTML markup for the highlight balance correctly.  Use a
"phrase query" (quoted) as well, and see if you can highlight around markup
like "phrasequery" to see what happens.  You might need to set
hl.weightMatches=false to ensure the words separately are highlighted.  I
suspect you will find there is a problem, and the root cause is here:
LUCENE-5734 <https://issues.apache.org/jira/browse/LUCENE-5734>   It's on
my long TODO list but hasn't bitten me lately so I've neglected it.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, May 24, 2020 at 7:20 AM Serkan KAZANCI 
wrote:

> Thanks Jörn for the answer,
>
> I use post tool to index html documents, so the html tags are stripped
> when indexed and stored. The remaining text is mapped to the field content
> by default.
>
> hl.fragsize=0 works perfect for the indexed document, but I can only
> display highlighted text-only version of html document because the html
> tags are stripped.
>
> So is it possible to index and store the html document without stripping
> the html tags, so that when the document is displayed with hl.fragsize=0
> parameter, it is displayed as original html document?
>
> Or
>
> Is it possible to give a whole html document as a parameter to the Unified
> highlighter so that output is also a highlighted html document?
>
> Or
>
> Do you have a better idea to highlight the keywords of the whole html
> document?
>
>  Thanks,
>
>  Serkan
>
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: Sunday, May 24, 2020 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting a whole html document using Unified highlighter
>
> hl.fragsize=0
>
> https://lucene.apache.org/solr/guide/8_5/highlighting.html
>
>
>
> > Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> >
> > Hi,
> >
> >
> >
> > I use solr to search over a million html documents, when a document is
> > searched and displayed, I want to highlight the keywords that are used to
> > find and access the document.
> >
> >
> >
> > Unified highlighter is fast, accurate and supports different languages
> but
> > only highlights passages with given parameters.
> >
> >
> >
> > How can I highlight a whole html document using Unified highlighter? I
> have
> > written a php code but it cannot do the complex word stemming functions.
> >
> >
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Serkan
> >
>
>



Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread David Smiley
Instead of stripping the HTML for the stored value, leave it be and remove
it during the analysis stage with solr.HTMLStripCharFilterFactory
<https://builds.apache.org/job/Solr-reference-guide-master/javadoc/charfilterfactories.html#solr-htmlstripcharfilterfactory>
This means the searchable text will only be the visible text, basically.
And the highlighter will only highlight what's searchable.

I suggest doing some experimentation for searching for words that you know
are directly adjacent (no spaces) to opening and closing tags to make sure
that the inserted HTML markup for the highlight balance correctly.  Use a
"phrase query" (quoted) as well, and see if you can highlight around markup
like "phrasequery" to see what happens.  You might need to set
hl.weightMatches=false to ensure the words separately are highlighted.  I
suspect you will find there is a problem, and the root cause is here:
LUCENE-5734 <https://issues.apache.org/jira/browse/LUCENE-5734>   It's on
my long TODO list but hasn't bitten me lately so I've neglected it.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, May 24, 2020 at 7:20 AM Serkan KAZANCI 
wrote:

> Thanks Jörn for the answer,
>
> I use post tool to index html documents, so the html tags are stripped
> when indexed and stored. The remaining text is mapped to the field content
> by default.
>
> hl.fragsize=0 works perfect for the indexed document, but I can only
> display highlighted text-only version of html document because the html
> tags are stripped.
>
> So is it possible to index and store the html document without stripping
> the html tags, so that when the document is displayed with hl.fragsize=0
> parameter, it is displayed as original html document?
>
> Or
>
> Is it possible to give a whole html document as a parameter to the Unified
> highlighter so that output is also a highlighted html document?
>
> Or
>
> Do you have a better idea to highlight the keywords of the whole html
> document?
>
>  Thanks,
>
>  Serkan
>
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: Sunday, May 24, 2020 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting a whole html document using Unified highlighter
>
> hl.fragsize=0
>
> https://lucene.apache.org/solr/guide/8_5/highlighting.html
>
>
>
> > Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> >
> > Hi,
> >
> >
> >
> > I use solr to search over a million html documents, when a document is
> > searched and displayed, I want to highlight the keywords that are used to
> > find and access the document.
> >
> >
> >
> > Unified highlighter is fast, accurate and supports different languages
> but
> > only highlights passages with given parameters.
> >
> >
> >
> > How can I highlight a whole html document using Unified highlighter? I
> have
> > written a php code but it cannot do the complex word stemming functions.
> >
> >
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Serkan
> >
>
>


RE: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Serkan KAZANCI
Thanks Jörn for the answer,

I use post tool to index html documents, so the html tags are stripped when 
indexed and stored. The remaining text is mapped to the field content by 
default. 

hl.fragsize=0 works perfect for the indexed document, but I can only display 
highlighted text-only version of html document because the html tags are 
stripped.

So is it possible to index and store the html document without stripping the 
html tags, so that when the document is displayed with hl.fragsize=0 parameter, 
it is displayed as original html document?

Or

Is it possible to give a whole html document as a parameter to the Unified 
highlighter so that output is also a highlighted html document?

Or 

Do you have a better idea to highlight the keywords of the whole html document? 

 Thanks,
 
 Serkan

-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com] 
Sent: Sunday, May 24, 2020 1:22 PM
To: solr-user@lucene.apache.org
Subject: Re: highlighting a whole html document using Unified highlighter

hl.fragsize=0

https://lucene.apache.org/solr/guide/8_5/highlighting.html



> Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> 
> Hi,
> 
> 
> 
> I use solr to search over a million html documents, when a document is
> searched and displayed, I want to highlight the keywords that are used to
> find and access the document.
> 
> 
> 
> Unified highlighter is fast, accurate and supports different languages but
> only highlights passages with given parameters.
> 
> 
> 
> How can I highlight a whole html document using Unified highlighter? I have
> written a php code but it cannot do the complex word stemming functions.
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> Serkan
> 



Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Jörn Franke
hl.fragsize=0

https://lucene.apache.org/solr/guide/8_5/highlighting.html



> Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> 
> Hi,
> 
> 
> 
> I use solr to search over a million html documents, when a document is
> searched and displayed, I want to highlight the keywords that are used to
> find and access the document.
> 
> 
> 
> Unified highlighter is fast, accurate and supports different languages but
> only highlights passages with given parameters.
> 
> 
> 
> How can I highlight a whole html document using Unified highlighter? I have
> written a php code but it cannot do the complex word stemming functions.
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> Serkan
> 


highlighting a whole html document using Unified highlighter

2020-05-24 Thread Serkan KAZANCI
Hi,

 

I use solr to search over a million html documents, when a document is
searched and displayed, I want to highlight the keywords that are used to
find and access the document.

 

Unified highlighter is fast, accurate and supports different languages but
only highlights passages with given parameters.

 

How can I highlight a whole html document using Unified highlighter? I have
written a php code but it cannot do the complex word stemming functions.

 

 

Thanks,

 

Serkan



Re: Highlighting Solr 8

2020-05-22 Thread David Smiley
What did you end up doing, Eric?  Did you migrate to the Unified
Highlighter?
~ David


On Wed, Oct 16, 2019 at 4:36 PM Eric Allen 
wrote:

> Thanks for the reply.
>
> Currently we are migrating from solr4 to solr8 under solr 4 we wrote our
> own highlighter because the provided one was too slow for our documents.
>
> We deal with many large documents, but we have full term vectors already.
> So as I understand it from my reading of the code the unified highlighter
> should be fast even on these large documents.
>
> The concern about alternate fields was if the highlighter was slow we
> could just return highlights from one field if they existed and if not then
> highlight the other fields.
>
> From my research I'm leaning towards returning highlights from all the
> fields we are interested in because I feel it will be fast.
>
> Eric Allen - Software Devloper, NetDocuments
> eric.al...@netdocuments.com | O: 801.989.9691 | C: 801.989.9691
>
> -Original Message-
> From: sasarun 
> Sent: Wednesday, October 16, 2019 2:45 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting Solr 8
>
> Hi Eric,
>
> Unified highlighter does not have an option to provide alternate field
> when highlighting. That option is available with Orginal and fast vector
> highlighter. As indicated in the Solr documentation, Unified is the
> recommended method for highlighting to meet most of the use cases. Please
> do share more details in case you are facing any specific issue with
> highlighting.
>
> Thanks,
>
> Arun
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>


Re: Integrate highlighting data within main search results

2020-05-12 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 12, 2020 at 11:35 PM Kamal Kishore Aggarwal
 wrote:
>
> any update on this guys
>
> On Wed, May 6, 2020 at 3:39 PM Kamal Kishore Aggarwal 
> wrote:
>
> > Hi,
> >
> > I am using highlighting feature in solr 8.3 with default method. With
> > current behaviour, main search results and highlighted results are shown in
> > different blocks. Is there a way we can implemented highlighting within the
> > search main results, without having to return extra block for highlighting?
> >
> > I believe that due to performance factor(like default limit values for
> > hl.maxAnalyzedChars, hl.snippets, hl.fragsize) that highlight is returned
> > as separate component. But, if someone has written custom component to
> > integrate both, please share the steps. Also, please share the performance
> > of it.
> >
> > Regards
> >
> > Kamal Kishore
> >


Re: Integrate highlighting data within main search results

2020-05-12 Thread Kamal Kishore Aggarwal
any update on this guys

On Wed, May 6, 2020 at 3:39 PM Kamal Kishore Aggarwal 
wrote:

> Hi,
>
> I am using highlighting feature in solr 8.3 with default method. With
> current behaviour, main search results and highlighted results are shown in
> different blocks. Is there a way we can implemented highlighting within the
> search main results, without having to return extra block for highlighting?
>
> I believe that due to performance factor(like default limit values for
> hl.maxAnalyzedChars, hl.snippets, hl.fragsize) that highlight is returned
> as separate component. But, if someone has written custom component to
> integrate both, please share the steps. Also, please share the performance
> of it.
>
> Regards
>
> Kamal Kishore
>


Integrate highlighting data within main search results

2020-05-06 Thread Kamal Kishore Aggarwal
Hi,

I am using highlighting feature in solr 8.3 with default method. With
current behaviour, main search results and highlighted results are shown in
different blocks. Is there a way we can implemented highlighting within the
search main results, without having to return extra block for highlighting?

I believe that due to performance factor(like default limit values for
hl.maxAnalyzedChars, hl.snippets, hl.fragsize) that highlight is returned
as separate component. But, if someone has written custom component to
integrate both, please share the steps. Also, please share the performance
of it.

Regards

Kamal Kishore


Solr proximity search highlighting issue

2020-04-02 Thread Nirav Shah
Hello Dev Team,

I found some problem in highlighting module. Not all the search terms are 
getting highlighted.

Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR pos3)"~30=true
Indexed text: "pos1 pos2 pos3 pos4"

please find attached response xml screen shot from solr.

You can see that only two terms are highlighted like, "pos1 
pos2 pos3 pos4"

The scenario is same in Solr source code since long time (I have checked in 
Solr version 4 to version 7). The scenario is when term positions are in-order 
in document and query both.

Please let me know your view on this.

Thank you,
Amit



pos1 pos2 pos3 pos4

Re: FW: Solr proximity search highlighting issue

2020-04-02 Thread Charlie Hull
I may be wrong here, but the problem may be that the match was on your 
terms pos1 and pos2 (you don't need the pos3 term to match, due to the 
OR operator) and thus that's what's been highlighted.


There's a hl.q parameter that lets you supply a different query for 
highlighting to the one you're using for searching, perhaps that could 
have a different and more forgiving pattern that made sure all your 
terms were highlighted?


Also, the XML didn't come through as this list strips attachments.

Best

Charlie

On 31/03/2020 19:27, Anil Shingala wrote:


Hello Dev Team,

I found some problem in highlighting module. Not all the search terms 
are getting highlighted.


Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR 
pos3)"~30=true


Indexed text: "pos1 pos2 pos3 pos4"

please find attached response xml screen shot from solr.

You can see that only two terms are highlighted like, "pos1 
pos2 pos3 pos4"


The scenario is same in Solr source code since long time (I have 
checked in Solr version 4 to version 7). The scenario is when term 
positions are in-order in document and query both.


Please let me know your view on this.

Regards,

Anil Shingala

*Knovos*
10521 Rosehaven Street, Suite 300 | Fairfax, VA 22030 (USA)
Office +1 703.226.1505

Main +1 703.226.1500 | +1 877.227.5457

/ashing...@knovos.com/ 
<mailto:ashing...@knovos.com>/_|_//www.knovos.com/ 
<http://www.knovos.com/>


Washington DC | New York | London | Paris | Gandhinagar | Tokyo

/Knovos was formerly also known as Capital Novus or Capital Legal 
Solutions. The information contained in this email message may be 
confidential or legally privileged. If you are not the intended 
recipient, please advise the sender by replying to this email and by 
immediately deleting all copies of this message and any attachments. 
Knovos, LLC is not authorized to practice law./




--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



FW: Solr proximity search highlighting issue

2020-03-31 Thread Anil Shingala


Hello Dev Team,

I found some problem in highlighting module. Not all the search terms are 
getting highlighted.

Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR pos3)"~30=true
Indexed text: "pos1 pos2 pos3 pos4"

please find attached response xml screen shot from solr.

You can see that only two terms are highlighted like, "pos1 
pos2 pos3 pos4"

The scenario is same in Solr source code since long time (I have checked in 
Solr version 4 to version 7). The scenario is when term positions are in-order 
in document and query both.

Please let me know your view on this.

Regards,
Anil Shingala
Knovos
10521 Rosehaven Street, Suite 300 | Fairfax, VA 22030 (USA)
Office +1 703.226.1505
Main +1 703.226.1500 | +1 877.227.5457
ashing...@knovos.com<mailto:ashing...@knovos.com> | 
www.knovos.com<http://www.knovos.com/>

Washington DC | New York | London | Paris | Gandhinagar | Tokyo

Knovos was formerly also known as Capital Novus or Capital Legal Solutions. The 
information contained in this email message may be confidential or legally 
privileged. If you are not the intended recipient, please advise the sender by 
replying to this email and by immediately deleting all copies of this message 
and any attachments. Knovos, LLC is not authorized to practice law.



pos1 pos2 pos3 pos4

Re: use highlighting on multivalued fields with positionIncrementGap 0

2020-02-23 Thread Paras Lehana
I haven't worked with highlighting much but what's the need to store terms
in multivalued field?

On Fri, 14 Feb 2020 at 20:04, Nicolas Franck 
wrote:

> I'm trying to use highlighting on a multivalued text field (analysis not
> so important) ..
>
>
>   { text: [ "hello", "world" ], id: 1 }
>
> but I want to match across the string boundaries:
>
>   q=text:"hello world"
>
> This works by setting the attribute
> positionIncrementGap to 0, but then the hightlighting entry is empty
>
>   "highlighting": { "1" : { "text" : [] } }
>
> Parameters are:
>
>   hl=true
>   hl.fl=text
>   hl.snippets=50
>   hl.fragSize=1
>
> Any idea why this happens?
> I guess this gap is internal stuff handled by Lucene that Solr doesn't
> know about?
> (as for lucene, there are no multivalued fields!)
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


use highlighting on multivalued fields with positionIncrementGap 0

2020-02-14 Thread Nicolas Franck
I'm trying to use highlighting on a multivalued text field (analysis not so 
important) ..


  { text: [ "hello", "world" ], id: 1 }

but I want to match across the string boundaries:

  q=text:"hello world"

This works by setting the attribute
positionIncrementGap to 0, but then the hightlighting entry is empty

  "highlighting": { "1" : { "text" : [] } }

Parameters are:

  hl=true
  hl.fl=text
  hl.snippets=50
  hl.fragSize=1

Any idea why this happens? 
I guess this gap is internal stuff handled by Lucene that Solr doesn't know 
about?
(as for lucene, there are no multivalued fields!)



Re: Highlighting on typing in search box

2019-11-24 Thread Paras Lehana
Hi rhys,

You are actually looking for an autocomplete! I work for the Auto-Suggest
(different names for the same thing) team at Indiamart.

Although we have a long journey making our Auto-Suggest one of the fastest
on the internet, I hope this summary will help you. You can always
connect with me for any doubt.

   1. If you don't have a high number of documents, you can use wildcards
   in the query for a simpler implementation. That means, you just convert
   your query from q=app to q=app* so that it matches "apple". Wildcards have
   its own cons (constant scoring, high QTime) and since it's a multi-term
   expansion, you cannot use many analyzers over the query. It had been
   performing well (under 50 ms QTime) till we had 22 million documents (which
   is still a large number of documents).

   2. Now that we have over 60 million documents, we have shifted to Edge
   NGrams
   
<https://lucene.apache.org/solr/guide/8_3/tokenizers.html#edge-n-gram-tokenizer>
   on the index side. Using this, your highlighting should work as expected
   and as you do in normal searching.

   3. You also have Suggester
   <https://lucene.apache.org/solr/guide/8_3/suggester.html> component for
   Solr that you can use for basic suggester functionalities. Supports
   highlighting.


>From my experience, the best solution is Edge NGrams (2). We have over 60
million documents and still a 25 ms QTime and we do complex scoring and
analysis.

On Thu, 21 Nov 2019 at 22:43, rhys J  wrote:

> Thank you both! I've got an autocomplete working on a basic format right
> now, and I'm working on implementing it to be smart about which core it
> searches.
>
> On Thu, Nov 21, 2019 at 11:43 AM Jörn Franke  wrote:
>
> > It sounds like you look for a suggester.
> >
> > You can use the suggester of Solr.
> >
> > For the visualization part: Angular has a suggestion box that can ingest
> > the results from Solr.
> >
> > > Am 21.11.2019 um 16:42 schrieb rhys J :
> > >
> > > Are there any recommended APIs or code examples of using Solr and then
> > > highlighting results below the search box?
> > >
> > > I'm trying to implement a search box that will search solr as the user
> > > types, if that makes sense?
> > >
> > > Thanks,
> > >
> > > Rhys
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Highlighting on typing in search box

2019-11-21 Thread rhys J
Thank you both! I've got an autocomplete working on a basic format right
now, and I'm working on implementing it to be smart about which core it
searches.

On Thu, Nov 21, 2019 at 11:43 AM Jörn Franke  wrote:

> It sounds like you look for a suggester.
>
> You can use the suggester of Solr.
>
> For the visualization part: Angular has a suggestion box that can ingest
> the results from Solr.
>
> > Am 21.11.2019 um 16:42 schrieb rhys J :
> >
> > Are there any recommended APIs or code examples of using Solr and then
> > highlighting results below the search box?
> >
> > I'm trying to implement a search box that will search solr as the user
> > types, if that makes sense?
> >
> > Thanks,
> >
> > Rhys
>


Re: Highlighting on typing in search box

2019-11-21 Thread Jörn Franke
It sounds like you look for a suggester.

You can use the suggester of Solr.

For the visualization part: Angular has a suggestion box that can ingest the 
results from Solr.

> Am 21.11.2019 um 16:42 schrieb rhys J :
> 
> Are there any recommended APIs or code examples of using Solr and then
> highlighting results below the search box?
> 
> I'm trying to implement a search box that will search solr as the user
> types, if that makes sense?
> 
> Thanks,
> 
> Rhys


Re: Highlighting on typing in search box

2019-11-21 Thread David Hastings
you can modify the result in this SO question to fit your needs:

https://stackoverflow.com/questions/16742610/retrieve-results-from-solr-using-jquery-calls

On Thu, Nov 21, 2019 at 10:42 AM rhys J  wrote:

> Are there any recommended APIs or code examples of using Solr and then
> highlighting results below the search box?
>
> I'm trying to implement a search box that will search solr as the user
> types, if that makes sense?
>
> Thanks,
>
> Rhys
>


Highlighting on typing in search box

2019-11-21 Thread rhys J
Are there any recommended APIs or code examples of using Solr and then
highlighting results below the search box?

I'm trying to implement a search box that will search solr as the user
types, if that makes sense?

Thanks,

Rhys


RE: Highlighting Solr 8

2019-10-16 Thread Eric Allen
Thanks for the reply.

Currently we are migrating from solr4 to solr8 under solr 4 we wrote our own 
highlighter because the provided one was too slow for our documents.

We deal with many large documents, but we have full term vectors already.  So 
as I understand it from my reading of the code the unified highlighter should 
be fast even on these large documents.

The concern about alternate fields was if the highlighter was slow we could 
just return highlights from one field if they existed and if not then highlight 
the other fields.

>From my research I'm leaning towards returning highlights from all the fields 
>we are interested in because I feel it will be fast.

Eric Allen - Software Devloper, NetDocuments
eric.al...@netdocuments.com | O: 801.989.9691 | C: 801.989.9691

-Original Message-
From: sasarun  
Sent: Wednesday, October 16, 2019 2:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting Solr 8

Hi Eric,

Unified highlighter does not have an option to provide alternate field when 
highlighting. That option is available with Orginal and fast vector 
highlighter. As indicated in the Solr documentation, Unified is the recommended 
method for highlighting to meet most of the use cases. Please do share more 
details in case you are facing any specific issue with highlighting. 

Thanks,

Arun 




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Highlighting Solr 8

2019-10-16 Thread sasarun
Hi Eric,

Unified highlighter does not have an option to provide alternate field when
highlighting. That option is available with Orginal and fast vector
highlighter. As indicated in the Solr documentation, Unified is the
recommended method for highlighting to meet most of the use cases. Please do
share more details in case you are facing any specific issue with
highlighting. 

Thanks,

Arun 




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Highlighting Solr 8

2019-10-09 Thread Eric Allen
Use case

I am querying a catchall field and then would like to highlight that term in 3 
other fields say a, b, and c.  I already have full term vectors.
>From my reading and limited testing the fastest choice would be
hl.method unified
hl.termVectors true
hl.termPositions true
hl.termOffsets true

Does anyone know the Big O of the unified highlighter?
Can you use alternate fields with the unified highlighter, ex. Highlight field 
a, if there isn't a match highlight the field in b and c.
I'm I going down the wrong path, I'm new to Solr, should I be doing something 
different?

Thanks!



[Image result for netdocuments logo]

Eric Allen
Software Developer, NetDocuments
2500 Executive Parkway, Suite 300
Lehi, Utah 84043

Office: 801.989.9697
C: 801.989.9691

eric.al...@netdocuments.com
www.netdocuments.com

[cid:image004.png@01D547C0.B6DEDF10][cid:image005.png@01D547C0.B6DEDF10][cid:image006.png@01D547C0.B6DEDF10][cid:image007.png@01D547C0.B6DEDF10]




Re: Solutio for long time highlighting

2019-08-30 Thread David Smiley
Ah, multi-threaded highlighting.  I implemented that once as a precursor to
ultimately other better things -- the UnifiedHighlighter.

Your ExecutorService ought to be a field on the handler.  In inform() you
can call SolrCore.addCloseHook to ensure this executor is shut down.

I suggest looking at this presentation from a few years ago I did with
Bloomberg at Lucene/Solr Revolution:
https://www.youtube.com/watch?v=tv5qKDKW8kk=14s
The UnifiedHighlighter is not enabled by default.  See the documentation:
https://builds.apache.org/job/Solr-reference-guide-master/javadoc/highlighting.html

Still... there is perhaps some value in multi-threading the highlighting
for huge docs, but I think we ultimately found no need after re-engineering
the highlighter.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Aug 28, 2019 at 10:36 AM SOLR4189  wrote:

> Hi all.
>
> In our team we thought about some tricky solution for queries with long
> time
> highlighting. For example, highlighting that takes more than 25 seconds.
> So,
> we created our component that wraps highlighting component of SOLR in this
> way:
>
> public void inform(SolrCore core) {
> . . . .
> subSearchComponent = core.getSearchComponent("highlight");
> . . . .
> }
>
> public void process(ResponseBuilder rb) throws Exception {
> long timeout = 25000;
> ExecutorService exec = null:
> try {
> exec = Executors.newSingleThreadExecutor();
> Future future = exec.submit(() -> {
> try {
> subSearchComponent.process(rb);
> } catch (IOException e) {
> return e;
> }
> return null;
> });
> Exception ex = future.get(timeout, TimeUnit.MILLISECONDS);
> if (ex != null) {
> throw ex;
> }
> } catch ( TimeoutException toe) {
> . . . .
> } catch (Exception e) {
>throw new IOException(e);
> } finally {
> if (exec != null) {
> exec.shutdownNow();
> }
> }
> }
>
> This solution works, but sometime we see that searchers stay open and as a
> result our RAM usage is pretty high (like a memory leak of
> SolrIndexSearcher
> objects). And only after a SOLR service restart they disappear.
>
> What do you think about this solution?
> Maybe exists some built-in function for it?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Solutio for long time highlighting

2019-08-28 Thread SOLR4189
Hi all.

In our team we thought about some tricky solution for queries with long time
highlighting. For example, highlighting that takes more than 25 seconds. So,
we created our component that wraps highlighting component of SOLR in this
way:

public void inform(SolrCore core) {
. . . .
subSearchComponent = core.getSearchComponent("highlight");
. . . .
}

public void process(ResponseBuilder rb) throws Exception {
long timeout = 25000;
ExecutorService exec = null:
try {
exec = Executors.newSingleThreadExecutor();
Future future = exec.submit(() -> {
try {
subSearchComponent.process(rb);
} catch (IOException e) {
return e;
} 
return null;
});
Exception ex = future.get(timeout, TimeUnit.MILLISECONDS);
if (ex != null) {
throw ex;
}
} catch ( TimeoutException toe) {
. . . .
} catch (Exception e) {
   throw new IOException(e);
} finally {
if (exec != null) {
exec.shutdownNow();
}
}
}

This solution works, but sometime we see that searchers stay open and as a
result our RAM usage is pretty high (like a memory leak of SolrIndexSearcher
objects). And only after a SOLR service restart they disappear.

What do you think about this solution?
Maybe exists some built-in function for it?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr 8 Nested Documents Highlighting

2019-08-21 Thread Eichstädt , Konrad
Hi Everybody,

we are using Apache Solr strongly with nested documents feature.  But in 
version 8 we noticed that the highlighting which has been worked in Version 7 
doesn't work properly in Version 8 . It works only for root parent element but 
not for childs? Is this an known open issue or should I open an ticket which 
describes the problem in more detail?

Many Thanks

Best regards

Konrad

Konrad Eichstädt

Abt. Informations- und Datenmanagement
IDM 2.2 Management fachspezifischer Nachweissysteme
Softwarearchitekt

Staatsbibliothek zu Berlin - Preußischer Kulturbesitz
Potsdamer Straße 33
10785 Berlin
Tel: +49 030 266 43



Re: More Highlighting details

2019-07-25 Thread govind nitk
Hi Furkan KAMACI,

Thanks for your reply. Will look at custom transformer options.
I was looking for any way( can be debug also) to get from highlight that it
is matched vs defaultSummary.
Anyways, will update findings on custom transformers and if it can solve
what I mean by.

Best Regards,
Govind

On Thu, Jul 25, 2019 at 11:45 PM Furkan KAMACI 
wrote:

> Hi Govind,
>
> Highlighting is the easiest way to detect it. You can find a similar
> question at here:
>
> https://stackoverflow.com/questions/9629147/how-to-return-column-that-matched-the-query-in-solr
>
> Kind Regards,
> Furkan KAMACI
>
> On Wed, Jul 24, 2019 at 9:28 PM govind nitk  wrote:
>
> > Hi Furkan KAMACI,
> >
> > Thanks for your thoughts on maxAnalyzedChars.
> >
> > So, how can we get whether its matched or not? Is there any way to get
> such
> > data from extra payload in response from solr ?
> >
> > Thanks and regards
> > Govind
> >
> > On Wed, Jul 24, 2019 at 8:43 PM Furkan KAMACI 
> > wrote:
> >
> > > Hi Govind,
> > >
> > > Using *hl.tag.pre* and *hl.tag.post* may help you. However you should
> > keep
> > > in mind that even such term exists in desired field, highlighter can
> use
> > > fallback field due to *hl.maxAnalyzedChars* parameter.
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > > On Wed, Jul 24, 2019 at 8:24 AM govind nitk 
> > wrote:
> > >
> > > > Hi all,
> > > > How about using hl.tag pre and post. If these are present then there
> is
> > > > actually field match otherwise its default summary ?
> > > > Will it work or there are some cases where it will not ?
> > > >
> > > >
> > > > Thanks in advance.
> > > >
> > > >
> > > >
> > > > On Tue, Jul 23, 2019 at 5:48 PM govind nitk 
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > How to get more details for highlighting ?
> > > > >
> > > > > I am using
> > > > >
> > > >
> > >
> >
> hl.method=unified&=title,url,paragraph=true=true
> > > > >
> > > > > So, if query words not matched, I am getting defaultSummary, which
> is
> > > > > great. *Can we get more info saying whether it found matches or
> > default
> > > > > summary? How to get such information?*
> > > > > Also is it good idea to use highlighting on urls ? Will urls get
> > > > distorted
> > > > > by any chance ?
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Govind
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: More Highlighting details

2019-07-25 Thread Furkan KAMACI
Hi Govind,

Highlighting is the easiest way to detect it. You can find a similar
question at here:
https://stackoverflow.com/questions/9629147/how-to-return-column-that-matched-the-query-in-solr

Kind Regards,
Furkan KAMACI

On Wed, Jul 24, 2019 at 9:28 PM govind nitk  wrote:

> Hi Furkan KAMACI,
>
> Thanks for your thoughts on maxAnalyzedChars.
>
> So, how can we get whether its matched or not? Is there any way to get such
> data from extra payload in response from solr ?
>
> Thanks and regards
> Govind
>
> On Wed, Jul 24, 2019 at 8:43 PM Furkan KAMACI 
> wrote:
>
> > Hi Govind,
> >
> > Using *hl.tag.pre* and *hl.tag.post* may help you. However you should
> keep
> > in mind that even such term exists in desired field, highlighter can use
> > fallback field due to *hl.maxAnalyzedChars* parameter.
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Wed, Jul 24, 2019 at 8:24 AM govind nitk 
> wrote:
> >
> > > Hi all,
> > > How about using hl.tag pre and post. If these are present then there is
> > > actually field match otherwise its default summary ?
> > > Will it work or there are some cases where it will not ?
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > > On Tue, Jul 23, 2019 at 5:48 PM govind nitk 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > How to get more details for highlighting ?
> > > >
> > > > I am using
> > > >
> > >
> >
> hl.method=unified&=title,url,paragraph=true=true
> > > >
> > > > So, if query words not matched, I am getting defaultSummary, which is
> > > > great. *Can we get more info saying whether it found matches or
> default
> > > > summary? How to get such information?*
> > > > Also is it good idea to use highlighting on urls ? Will urls get
> > > distorted
> > > > by any chance ?
> > > >
> > > >
> > > > Best Regards,
> > > > Govind
> > > >
> > > >
> > >
> >
>


Re: More Highlighting details

2019-07-24 Thread govind nitk
Hi Furkan KAMACI,

Thanks for your thoughts on maxAnalyzedChars.

So, how can we get whether its matched or not? Is there any way to get such
data from extra payload in response from solr ?

Thanks and regards
Govind

On Wed, Jul 24, 2019 at 8:43 PM Furkan KAMACI 
wrote:

> Hi Govind,
>
> Using *hl.tag.pre* and *hl.tag.post* may help you. However you should keep
> in mind that even such term exists in desired field, highlighter can use
> fallback field due to *hl.maxAnalyzedChars* parameter.
>
> Kind Regards,
> Furkan KAMACI
>
> On Wed, Jul 24, 2019 at 8:24 AM govind nitk  wrote:
>
> > Hi all,
> > How about using hl.tag pre and post. If these are present then there is
> > actually field match otherwise its default summary ?
> > Will it work or there are some cases where it will not ?
> >
> >
> > Thanks in advance.
> >
> >
> >
> > On Tue, Jul 23, 2019 at 5:48 PM govind nitk 
> wrote:
> >
> > > Hi all,
> > >
> > > How to get more details for highlighting ?
> > >
> > > I am using
> > >
> >
> hl.method=unified&=title,url,paragraph=true=true
> > >
> > > So, if query words not matched, I am getting defaultSummary, which is
> > > great. *Can we get more info saying whether it found matches or default
> > > summary? How to get such information?*
> > > Also is it good idea to use highlighting on urls ? Will urls get
> > distorted
> > > by any chance ?
> > >
> > >
> > > Best Regards,
> > > Govind
> > >
> > >
> >
>


Re: More Highlighting details

2019-07-24 Thread Furkan KAMACI
Hi Govind,

Using *hl.tag.pre* and *hl.tag.post* may help you. However you should keep
in mind that even such term exists in desired field, highlighter can use
fallback field due to *hl.maxAnalyzedChars* parameter.

Kind Regards,
Furkan KAMACI

On Wed, Jul 24, 2019 at 8:24 AM govind nitk  wrote:

> Hi all,
> How about using hl.tag pre and post. If these are present then there is
> actually field match otherwise its default summary ?
> Will it work or there are some cases where it will not ?
>
>
> Thanks in advance.
>
>
>
> On Tue, Jul 23, 2019 at 5:48 PM govind nitk  wrote:
>
> > Hi all,
> >
> > How to get more details for highlighting ?
> >
> > I am using
> >
> hl.method=unified&=title,url,paragraph=true=true
> >
> > So, if query words not matched, I am getting defaultSummary, which is
> > great. *Can we get more info saying whether it found matches or default
> > summary? How to get such information?*
> > Also is it good idea to use highlighting on urls ? Will urls get
> distorted
> > by any chance ?
> >
> >
> > Best Regards,
> > Govind
> >
> >
>


Re: More Highlighting details

2019-07-23 Thread govind nitk
Hi all,
How about using hl.tag pre and post. If these are present then there is
actually field match otherwise its default summary ?
Will it work or there are some cases where it will not ?


Thanks in advance.



On Tue, Jul 23, 2019 at 5:48 PM govind nitk  wrote:

> Hi all,
>
> How to get more details for highlighting ?
>
> I am using
> hl.method=unified&=title,url,paragraph=true=true
>
> So, if query words not matched, I am getting defaultSummary, which is
> great. *Can we get more info saying whether it found matches or default
> summary? How to get such information?*
> Also is it good idea to use highlighting on urls ? Will urls get distorted
> by any chance ?
>
>
> Best Regards,
> Govind
>
>


More Highlighting details

2019-07-23 Thread govind nitk
Hi all,

How to get more details for highlighting ?

I am using
hl.method=unified&=title,url,paragraph=true=true

So, if query words not matched, I am getting defaultSummary, which is
great. *Can we get more info saying whether it found matches or default
summary? How to get such information?*
Also is it good idea to use highlighting on urls ? Will urls get distorted
by any chance ?


Best Regards,
Govind


RE: highlighting not working as expected

2019-07-01 Thread Martin Frank Hansen (MHQ)
Hi Edwin,

Thanks for your explanation, makes sense now.

Best regards

Martin


Internal - KMD A/S

-Original Message-
From: Zheng Lin Edwin Yeo 
Sent: 30. juni 2019 01:57
To: solr-user@lucene.apache.org
Subject: Re: highlighting not working as expected

Hi,

If you are using the type "string", it will require exact match, including 
space and upper/lower case.

You can use the type "text" for a start, but further down the road it will be 
good to have your own custom fieldType with your own tokenizer and filter.

Regards,
Edwin

On Tue, 25 Jun 2019 at 14:52, Martin Frank Hansen (MHQ)  wrote:

> Hi again,
>
> I have tested a bit and I was wondering if the highlighter requires a
> field to be of type "text"? Whenever I try highlighting on fields
> which are of type "string" nothing gets returned.
>
> Best regards
>
> Martin
>
>
> Internal - KMD A/S
>
> -Original Message-
> From: Jörn Franke 
> Sent: 11. juni 2019 08:45
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting not working as expected
>
> Could it be a stop word ? What is the exact type definition of those
> fields? Could this word be omitted or with wrong encoding during
> loading of the documents?
>
> > Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) :
> >
> > Hi,
> >
> > I am having some difficulties making highlighting work. For some
> > reason
> the highlighting feature only works on some fields but not on other
> fields even though these fields are stored.
> >
> > An example of a request looks like this:
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagst
> itel=Sagstitel=%3C/b%3E=%3Cb%3E=
> on=rotte
> >
> > It simply returns an empty set, for all documents even though I can
> > see
> several documents which have “Sagstitel” containing the word “rotte”
> (rotte=rat).  What am I missing here?
> >
> > I am using the standard highlighter as below.
> >
> >
> > 
> >
> >  
> >  
> >   >  default="true"
> >  class="solr.highlight.GapFragmenter">
> >
> >  100
> >
> >  
> >
> >  
> >   >  class="solr.highlight.RegexFragmenter">
> >
> >  
> >  70
> >  
> >  0.5
> >  
> >  [-\w
> ,/\n\]{20,200}
> >
> >  
> >
> >  
> >   > default="true"
> > class="solr.highlight.HtmlFormatter">
> >
> >  b
> >  /b
> >
> >  
> >
> >  
> >   >   class="solr.highlight.HtmlEncoder" />
> >
> >  
> >   >   class="solr.highlight.SimpleFragListBuilder"/>
> >
> >  
> >   >   class="solr.highlight.SingleFragListBuilder"/>
> >
> >  
> >  >   default="true"
> >
> > class="solr.highlight.WeightedFragListBuilder"/>
> >
> >  
> >   >default="true"
> >class="solr.highlight.ScoreOrderFragmentsBuilder">
> >
> >  
> >
> >  
> >   >class="solr.highlight.ScoreOrderFragmentsBuilder">
> >
> >  
> >  
> >
> >  
> >
> >   >   default="true"
> >   class="solr.highlight.SimpleBoundaryScanner">
> >
> >  10
> >  .,!? 
> >
> >  
> >
> >   >
>  class="solr.highlight.BreakIteratorBoundaryScanner">
> >
> >  
> >  WORD
> >  
> >  
> >  da
> >
> >  
> >
> >  
> >
> > Hope that some one can help, thanks in advance.
> >
> > Best regards
> > Martin
> >
> >
> >
> > Internal - KMD A/S
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> > finder
> du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der
> fortæller, hvordan vi behandler oplysninger om dig.
> >
> > Protection of your personal dat

Re: highlighting not working as expected

2019-06-29 Thread Zheng Lin Edwin Yeo
Hi,

If you are using the type "string", it will require exact match, including
space and upper/lower case.

You can use the type "text" for a start, but further down the road it will
be good to have your own custom fieldType with your own tokenizer and
filter.

Regards,
Edwin

On Tue, 25 Jun 2019 at 14:52, Martin Frank Hansen (MHQ)  wrote:

> Hi again,
>
> I have tested a bit and I was wondering if the highlighter requires a
> field to be of type "text"? Whenever I try highlighting on fields which are
> of type "string" nothing gets returned.
>
> Best regards
>
> Martin
>
>
> Internal - KMD A/S
>
> -Original Message-
> From: Jörn Franke 
> Sent: 11. juni 2019 08:45
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting not working as expected
>
> Could it be a stop word ? What is the exact type definition of those
> fields? Could this word be omitted or with wrong encoding during loading of
> the documents?
>
> > Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) :
> >
> > Hi,
> >
> > I am having some difficulties making highlighting work. For some reason
> the highlighting feature only works on some fields but not on other fields
> even though these fields are stored.
> >
> > An example of a request looks like this:
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte
> >
> > It simply returns an empty set, for all documents even though I can see
> several documents which have “Sagstitel” containing the word “rotte”
> (rotte=rat).  What am I missing here?
> >
> > I am using the standard highlighter as below.
> >
> >
> > 
> >
> >  
> >  
> >   >  default="true"
> >  class="solr.highlight.GapFragmenter">
> >
> >  100
> >
> >  
> >
> >  
> >   >  class="solr.highlight.RegexFragmenter">
> >
> >  
> >  70
> >  
> >  0.5
> >  
> >  [-\w
> ,/\n\]{20,200}
> >
> >  
> >
> >  
> >   > default="true"
> > class="solr.highlight.HtmlFormatter">
> >
> >  b
> >  /b
> >
> >  
> >
> >  
> >   >   class="solr.highlight.HtmlEncoder" />
> >
> >  
> >   >   class="solr.highlight.SimpleFragListBuilder"/>
> >
> >  
> >   >   class="solr.highlight.SingleFragListBuilder"/>
> >
> >  
> >  >   default="true"
> >   class="solr.highlight.WeightedFragListBuilder"/>
> >
> >  
> >   >default="true"
> >class="solr.highlight.ScoreOrderFragmentsBuilder">
> >
> >  
> >
> >  
> >   >class="solr.highlight.ScoreOrderFragmentsBuilder">
> >
> >  
> >  
> >
> >  
> >
> >   >   default="true"
> >   class="solr.highlight.SimpleBoundaryScanner">
> >
> >  10
> >  .,!? 
> >
> >  
> >
> >   >
>  class="solr.highlight.BreakIteratorBoundaryScanner">
> >
> >  
> >  WORD
> >  
> >  
> >  da
> >
> >  
> >
> >  
> >
> > Hope that some one can help, thanks in advance.
> >
> > Best regards
> > Martin
> >
> >
> >
> > Internal - KMD A/S
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder
> du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der
> fortæller, hvordan vi behandler oplysninger om dig.
> >
> > Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we
> process your personal data.
> >
> > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig
> information. Hvis du ved en fejltagelse modtager e-mail

RE: highlighting not working as expected

2019-06-25 Thread Martin Frank Hansen (MHQ)
Hi again,

I have tested a bit and I was wondering if the highlighter requires a field to 
be of type "text"? Whenever I try highlighting on fields which are of type 
"string" nothing gets returned.

Best regards

Martin


Internal - KMD A/S

-Original Message-
From: Jörn Franke 
Sent: 11. juni 2019 08:45
To: solr-user@lucene.apache.org
Subject: Re: highlighting not working as expected

Could it be a stop word ? What is the exact type definition of those fields? 
Could this word be omitted or with wrong encoding during loading of the 
documents?

> Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) :
>
> Hi,
>
> I am having some difficulties making highlighting work. For some reason the 
> highlighting feature only works on some fields but not on other fields even 
> though these fields are stored.
>
> An example of a request looks like this: 
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte
>
> It simply returns an empty set, for all documents even though I can see 
> several documents which have “Sagstitel” containing the word “rotte” 
> (rotte=rat).  What am I missing here?
>
> I am using the standard highlighter as below.
>
>
> 
>
>  
>  
>default="true"
>  class="solr.highlight.GapFragmenter">
>
>  100
>
>  
>
>  
>class="solr.highlight.RegexFragmenter">
>
>  
>  70
>  
>  0.5
>  
>  [-\w ,/\n\]{20,200}
>
>  
>
>  
>   default="true"
> class="solr.highlight.HtmlFormatter">
>
>  b
>  /b
>
>  
>
>  
> class="solr.highlight.HtmlEncoder" />
>
>  
> class="solr.highlight.SimpleFragListBuilder"/>
>
>  
> class="solr.highlight.SingleFragListBuilder"/>
>
>  
>default="true"
>   class="solr.highlight.WeightedFragListBuilder"/>
>
>  
>  default="true"
>class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>
>  
>  class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>  
>
>  
>
> default="true"
>   class="solr.highlight.SimpleBoundaryScanner">
>
>  10
>  .,!? 
>
>  
>
> class="solr.highlight.BreakIteratorBoundaryScanner">
>
>  
>  WORD
>  
>  
>  da
>
>  
>
>  
>
> Hope that some one can help, thanks in advance.
>
> Best regards
> Martin
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process 
> your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
>
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.


RE: highlighting not working as expected

2019-06-17 Thread Martin Frank Hansen (MHQ)
Hi Edwin,

Yes the field is defined just like the other fields:



BR
Martin


Internal - KMD A/S

-Original Message-
From: Zheng Lin Edwin Yeo 
Sent: 4. juni 2019 10:32
To: solr-user@lucene.apache.org
Subject: Re: highlighting not working as expected

Hi Martin,

What fieldType are you using for the field “Sagstitel”? Is it the same as other 
fields?

Regards,
Edwin

On Mon, 3 Jun 2019 at 16:06, Martin Frank Hansen (MHQ)  wrote:

> Hi,
>
> I am having some difficulties making highlighting work. For some
> reason the highlighting feature only works on some fields but not on
> other fields even though these fields are stored.
>
> An example of a request looks like this:
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagst
> itel=Sagstitel=%3C/b%3E=%3Cb%3E=
> on=rotte
>
> It simply returns an empty set, for all documents even though I can
> see several documents which have “Sagstitel” containing the word “rotte”
> (rotte=rat).  What am I missing here?
>
> I am using the standard highlighter as below.
>
>
> 
> 
>   
>   
>  default="true"
>   class="solr.highlight.GapFragmenter">
> 
>   100
> 
>   
>
>   
>  class="solr.highlight.RegexFragmenter">
> 
>   
>   70
>   
>   0.5
>   
>   [-\w
> ,/\n\]{20,200}
> 
>   
>
>   
> default="true"
>  class="solr.highlight.HtmlFormatter">
> 
>   b
>   /b
> 
>   
>
>   
>   class="solr.highlight.HtmlEncoder" />
>
>   
>   class="solr.highlight.SimpleFragListBuilder"/>
>
>   
>   class="solr.highlight.SingleFragListBuilder"/>
>
>   
>  default="true"
>
> class="solr.highlight.WeightedFragListBuilder"/>
>
>   
>default="true"
> class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>
>   
>class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>   
> 
>   
>
>   default="true"
>class="solr.highlight.SimpleBoundaryScanner">
> 
>   10
>   .,!? 
> 
>   
>
>   class="solr.highlight.BreakIteratorBoundaryScanner">
> 
>   
>   WORD
>   
>   
>   da
> 
>   
> 
>   
>
> Hope that some one can help, thanks in advance.
>
> Best regards
> Martin
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> finder du KMD’s
> Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how
> we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere 
> den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> er fri for virus og andre fejl, som kan påvirke computeren eller
> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
> opstået i forbindelse med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If
> you have received this message by mistake, please inform the sender of
> the mistake by sending a reply, then delete the message from your
> system without making, distributing or retaining any copies of it.
> Although we believe that the message and any attachments are free from
> viruses and other errors that might affect the computer or it-system
> where it is received and read, the recipient opens the message at his or her 
> own risk.
> We assume no responsibility for any loss or damage arising from the
> receipt or use of this message.
>


RE: highlighting not working as expected

2019-06-17 Thread Martin Frank Hansen (MHQ)
Hi Jörn,

Thanks for your input!

I do not use stop-words, so that should not be the issue. The encoding of the 
documents might be an issue, as they come in many different file formats. It 
will however need to test this.

The field is defined as below:



BR

Martin


Internal - KMD A/S

-Original Message-
From: Jörn Franke 
Sent: 11. juni 2019 08:45
To: solr-user@lucene.apache.org
Subject: Re: highlighting not working as expected

Could it be a stop word ? What is the exact type definition of those fields? 
Could this word be omitted or with wrong encoding during loading of the 
documents?

> Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) :
>
> Hi,
>
> I am having some difficulties making highlighting work. For some reason the 
> highlighting feature only works on some fields but not on other fields even 
> though these fields are stored.
>
> An example of a request looks like this: 
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte
>
> It simply returns an empty set, for all documents even though I can see 
> several documents which have “Sagstitel” containing the word “rotte” 
> (rotte=rat).  What am I missing here?
>
> I am using the standard highlighter as below.
>
>
> 
>
>  
>  
>default="true"
>  class="solr.highlight.GapFragmenter">
>
>  100
>
>  
>
>  
>class="solr.highlight.RegexFragmenter">
>
>  
>  70
>  
>  0.5
>  
>  [-\w ,/\n\]{20,200}
>
>  
>
>  
>   default="true"
> class="solr.highlight.HtmlFormatter">
>
>  b
>  /b
>
>  
>
>  
> class="solr.highlight.HtmlEncoder" />
>
>  
> class="solr.highlight.SimpleFragListBuilder"/>
>
>  
> class="solr.highlight.SingleFragListBuilder"/>
>
>  
>default="true"
>   class="solr.highlight.WeightedFragListBuilder"/>
>
>  
>  default="true"
>class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>
>  
>  class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>  
>
>  
>
> default="true"
>   class="solr.highlight.SimpleBoundaryScanner">
>
>  10
>  .,!? 
>
>  
>
> class="solr.highlight.BreakIteratorBoundaryScanner">
>
>  
>  WORD
>  
>  
>  da
>
>  
>
>  
>
> Hope that some one can help, thanks in advance.
>
> Best regards
> Martin
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process 
> your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
>
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.


Re: highlighting not working as expected

2019-06-11 Thread Jörn Franke
Could it be a stop word ? What is the exact type definition of those fields? 
Could this word be omitted or with wrong encoding during loading of the 
documents?

> Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) :
> 
> Hi,
> 
> I am having some difficulties making highlighting work. For some reason the 
> highlighting feature only works on some fields but not on other fields even 
> though these fields are stored.
> 
> An example of a request looks like this: 
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte
> 
> It simply returns an empty set, for all documents even though I can see 
> several documents which have “Sagstitel” containing the word “rotte” 
> (rotte=rat).  What am I missing here?
> 
> I am using the standard highlighter as below.
> 
> 
> 
>
>  
>  
>default="true"
>  class="solr.highlight.GapFragmenter">
>
>  100
>
>  
> 
>  
>class="solr.highlight.RegexFragmenter">
>
>  
>  70
>  
>  0.5
>  
>  [-\w ,/\n\]{20,200}
>
>  
> 
>  
>   default="true"
> class="solr.highlight.HtmlFormatter">
>
>  b
>  /b
>
>  
> 
>  
> class="solr.highlight.HtmlEncoder" />
> 
>  
> class="solr.highlight.SimpleFragListBuilder"/>
> 
>  
> class="solr.highlight.SingleFragListBuilder"/>
> 
>  
>default="true"
>   class="solr.highlight.WeightedFragListBuilder"/>
> 
>  
>  default="true"
>class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
> 
>  
>  class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>  
>
>  
> 
> default="true"
>   class="solr.highlight.SimpleBoundaryScanner">
>
>  10
>  .,!? 
>
>  
> 
> class="solr.highlight.BreakIteratorBoundaryScanner">
>
>  
>  WORD
>  
>  
>  da
>
>  
>
>  
> 
> Hope that some one can help, thanks in advance.
> 
> Best regards
> Martin
> 
> 
> 
> Internal - KMD A/S
> 
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
> 
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process 
> your personal data.
> 
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
> 
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.


RE: highlighting not working as expected

2019-06-11 Thread Martin Frank Hansen (MHQ)
Hi David,

Thanks for your response and sorry my late reply.

Still the same result when using hl.method=unified.

Best regards
Martin


Internal - KMD A/S

-Original Message-
From: David Smiley 
Sent: 10. juni 2019 16:48
To: solr-user 
Subject: Re: highlighting not working as expected

Please try hl.method=unified and tell us if that helps.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 3, 2019 at 4:06 AM Martin Frank Hansen (MHQ)  wrote:

> Hi,
>
> I am having some difficulties making highlighting work. For some
> reason the highlighting feature only works on some fields but not on
> other fields even though these fields are stored.
>
> An example of a request looks like this:
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagst
> itel=Sagstitel=%3C/b%3E=%3Cb%3E=
> on=rotte
>
> It simply returns an empty set, for all documents even though I can
> see several documents which have “Sagstitel” containing the word “rotte”
> (rotte=rat).  What am I missing here?
>
> I am using the standard highlighter as below.
>
>
> 
> 
>   
>   
>  default="true"
>   class="solr.highlight.GapFragmenter">
> 
>   100
> 
>   
>
>   
>  class="solr.highlight.RegexFragmenter">
> 
>   
>   70
>   
>   0.5
>   
>   [-\w
> ,/\n\]{20,200}
> 
>   
>
>   
> default="true"
>  class="solr.highlight.HtmlFormatter">
> 
>   b
>   /b
> 
>   
>
>   
>   class="solr.highlight.HtmlEncoder" />
>
>   
>   class="solr.highlight.SimpleFragListBuilder"/>
>
>   
>   class="solr.highlight.SingleFragListBuilder"/>
>
>   
>  default="true"
>
> class="solr.highlight.WeightedFragListBuilder"/>
>
>   
>default="true"
> class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>
>   
>class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>   
> 
>   
>
>   default="true"
>class="solr.highlight.SimpleBoundaryScanner">
> 
>   10
>   .,!? 
> 
>   
>
>   class="solr.highlight.BreakIteratorBoundaryScanner">
> 
>   
>   WORD
>   
>   
>   da
> 
>   
> 
>   
>
> Hope that some one can help, thanks in advance.
>
> Best regards
> Martin
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> finder du KMD’s
> Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how
> we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere 
> den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> er fri for virus og andre fejl, som kan påvirke computeren eller
> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
> opstået i forbindelse med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If
> you have received this message by mistake, please inform the sender of
> the mistake by sending a reply, then delete the message from your
> system without making, distributing or retaining any copies of it.
> Although we believe that the message and any attachments are free from
> viruses and other errors that might affect the computer or it-system
> where it is received and read, the recipient opens the message at his or her 
> own risk.
> We assume no responsibility for any loss or damage arising from the
> receipt or use of this message.
>


Re: highlighting not working as expected

2019-06-10 Thread David Smiley
Please try hl.method=unified and tell us if that helps.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 3, 2019 at 4:06 AM Martin Frank Hansen (MHQ)  wrote:

> Hi,
>
> I am having some difficulties making highlighting work. For some reason
> the highlighting feature only works on some fields but not on other fields
> even though these fields are stored.
>
> An example of a request looks like this:
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte
>
> It simply returns an empty set, for all documents even though I can see
> several documents which have “Sagstitel” containing the word “rotte”
> (rotte=rat).  What am I missing here?
>
> I am using the standard highlighter as below.
>
>
> 
> 
>   
>   
>  default="true"
>   class="solr.highlight.GapFragmenter">
> 
>   100
> 
>   
>
>   
>  class="solr.highlight.RegexFragmenter">
> 
>   
>   70
>   
>   0.5
>   
>   [-\w
> ,/\n\]{20,200}
> 
>   
>
>   
> default="true"
>  class="solr.highlight.HtmlFormatter">
> 
>   b
>   /b
> 
>   
>
>   
>   class="solr.highlight.HtmlEncoder" />
>
>   
>   class="solr.highlight.SimpleFragListBuilder"/>
>
>   
>   class="solr.highlight.SingleFragListBuilder"/>
>
>   
>  default="true"
>class="solr.highlight.WeightedFragListBuilder"/>
>
>   
>default="true"
> class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>
>   
>class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>   
> 
>   
>
>   default="true"
>class="solr.highlight.SimpleBoundaryScanner">
> 
>   10
>   .,!? 
> 
>   
>
>   class="solr.highlight.BreakIteratorBoundaryScanner">
> 
>   
>   WORD
>   
>   
>   da
> 
>   
> 
>   
>
> Hope that some one can help, thanks in advance.
>
> Best regards
> Martin
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der
> fortæller, hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we
> process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
> dig slette e-mailen i dit system uden at videresende eller kopiere den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
> for virus og andre fejl, som kan påvirke computeren eller it-systemet,
> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
> med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If you
> have received this message by mistake, please inform the sender of the
> mistake by sending a reply, then delete the message from your system
> without making, distributing or retaining any copies of it. Although we
> believe that the message and any attachments are free from viruses and
> other errors that might affect the computer or it-system where it is
> received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the receipt
> or use of this message.
>


Re: highlighting not working as expected

2019-06-04 Thread Zheng Lin Edwin Yeo
Hi Martin,

What fieldType are you using for the field “Sagstitel”? Is it the same as
other fields?

Regards,
Edwin

On Mon, 3 Jun 2019 at 16:06, Martin Frank Hansen (MHQ)  wrote:

> Hi,
>
> I am having some difficulties making highlighting work. For some reason
> the highlighting feature only works on some fields but not on other fields
> even though these fields are stored.
>
> An example of a request looks like this:
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte
>
> It simply returns an empty set, for all documents even though I can see
> several documents which have “Sagstitel” containing the word “rotte”
> (rotte=rat).  What am I missing here?
>
> I am using the standard highlighter as below.
>
>
> 
> 
>   
>   
>  default="true"
>   class="solr.highlight.GapFragmenter">
> 
>   100
> 
>   
>
>   
>  class="solr.highlight.RegexFragmenter">
> 
>   
>   70
>   
>   0.5
>   
>   [-\w
> ,/\n\]{20,200}
> 
>   
>
>   
> default="true"
>  class="solr.highlight.HtmlFormatter">
> 
>   b
>   /b
> 
>   
>
>   
>   class="solr.highlight.HtmlEncoder" />
>
>   
>   class="solr.highlight.SimpleFragListBuilder"/>
>
>   
>   class="solr.highlight.SingleFragListBuilder"/>
>
>   
>  default="true"
>class="solr.highlight.WeightedFragListBuilder"/>
>
>   
>default="true"
> class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>
>   
>class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>   
>   
> 
>   
>
>   default="true"
>class="solr.highlight.SimpleBoundaryScanner">
> 
>   10
>   .,!? 
> 
>   
>
>   class="solr.highlight.BreakIteratorBoundaryScanner">
> 
>   
>   WORD
>   
>   
>   da
> 
>   
> 
>   
>
> Hope that some one can help, thanks in advance.
>
> Best regards
> Martin
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der
> fortæller, hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we
> process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
> dig slette e-mailen i dit system uden at videresende eller kopiere den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
> for virus og andre fejl, som kan påvirke computeren eller it-systemet,
> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
> med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If you
> have received this message by mistake, please inform the sender of the
> mistake by sending a reply, then delete the message from your system
> without making, distributing or retaining any copies of it. Although we
> believe that the message and any attachments are free from viruses and
> other errors that might affect the computer or it-system where it is
> received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the receipt
> or use of this message.
>


highlighting not working as expected

2019-06-03 Thread Martin Frank Hansen (MHQ)
Hi,

I am having some difficulties making highlighting work. For some reason the 
highlighting feature only works on some fields but not on other fields even 
though these fields are stored.

An example of a request looks like this: 
http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagstitel=Sagstitel=%3C/b%3E=%3Cb%3E=on=rotte

It simply returns an empty set, for all documents even though I can see several 
documents which have “Sagstitel” containing the word “rotte” (rotte=rat).  What 
am I missing here?

I am using the standard highlighter as below.




  
  
  

  100

  

  
  

  
  70
  
  0.5
  
  [-\w ,/\n\]{20,200}

  

  
  

  b
  /b

  

  
  

  
  

  
  

  
 

  
  

  

  
  

  
  

  

  

  10
  .,!? 

  

  

  
  WORD
  
  
  da

  

  

Hope that some one can help, thanks in advance.

Best regards
Martin



Internal - KMD A/S

Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s 
Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your 
personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis 
du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og 
ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre 
fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og 
læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar 
for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have 
received this message by mistake, please inform the sender of the mistake by 
sending a reply, then delete the message from your system without making, 
distributing or retaining any copies of it. Although we believe that the 
message and any attachments are free from viruses and other errors that might 
affect the computer or it-system where it is received and read, the recipient 
opens the message at his or her own risk. We assume no responsibility for any 
loss or damage arising from the receipt or use of this message.


Re: Highlighting

2019-04-15 Thread Shawn Heisey

On 4/15/2019 11:36 AM, Mike Phillips wrote:

I don't understand why highlighting does not return anything but the document 
id.
I created a core imported all my data, everything seems like it should be 
working.
 From reading the documentation I expect it to show me highlight information for
assetName around Potter, but I never get anything but the document id (assetId)


I'll caution you that I do not know very much about highlighting at all. 
 I've only toyed with it, have never actually USED it for real.


But I do think you need the fields you're interested in doing 
highlighting on in the hl.fl parameter.  This parameter is not shown in 
the output you shared.


I also think that the fields you want to highlight on must be included 
in the query, so that position data is found, but I am not positive 
about this -- I could be wrong.


If you share the entry from solr.log where the query is logged, we will 
be able to see ALL of the parameters used for that query, even if they 
are not echoed in the results.  You could also add "echoParams=all" to 
the parameter list to see all of the parameters in the response.


Thanks,
Shawn


Highlighting

2019-04-15 Thread Mike Phillips
I don't understand why highlighting does not return anything but the document 
id.
I created a core imported all my data, everything seems like it should be 
working.
From reading the documentation I expect it to show me highlight information for
assetName around Potter, but I never get anything but the document id (assetId)

Here are the entries I added to managed-schema for assetName. Anybody know what
I should be targeting as a solution.




#- My query and results -#




  0
  1
  
Potter
on
/EM

  clientId:11
  assetTypeId:1

EM
1555116259160
  


  
3
Harry Potter and The Order of The Phoenix.mov
2012-09-27T02:34:27Z
Quicktime with Audio
1
11
Level 3
27
27
Harry Potter and The Order of The Phoenix.mov
23.976
Non drop frame
2.35
16
4:2:0
1624857205677228032
  
3
StudelyCastleHotel
2019-04-12T22:57:33Z
JPEG
1
11
Level 3
10130
10130

  Producer


  Michael Potter


  7

StudelyCastleHotel
1630650901282684928

  
  



Range query on multivalued string field results in useless highlighting

2019-03-22 Thread Wolf, Karl (NIH/NLM/LHC) [C]
Range queries against mutivalued string fields produces useless highlighting, 
even though "hl.highlightMultiTerm":"true"

I have uncovered what I believe is a bug. At the very lease it is a difference 
in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1).

I have a Field defined in my schema as:




I am using a query containing a Range clause and I am using highlighting to get 
the list of values that match the range query.

All examples below were using the appropriate Solr Admin Server Query page.

The range query using Solr v5.1.0 produces CORRECT and useful results:

{
  "responseHeader": {
"status": 0,
"QTime": 366,
"params": {
  "q": "ResourceCorrespondent:[A TO B}",
  "hl": "true",
  "indent": "true",
  "hl.preserveMulti": "true",
  "fl": "ResourceCorrespondent,ResourceID",
  "hl.requireFieldMatch": "true",
  "hl.usePhraseHighlighter": "true",
  "hl.fl": "ResourceCorrespondent",
  "wt": "json",
  "hl.highlightMultiTerm": "true",
  "_": "1553275722025"
}
  },
  "response": {
"numFound": 999,
"start": 0,
"docs": [
  {
    "ResourceCorrespondent": [
  "Stanley, Wendell M.",
  "Avery, Roy"
],
    "ResourceID": "CCAAHG"
  },
  {
"ResourceCorrespondent": [
  "Avery, Roy"
],
"ResourceID": "CCGMDS"
  },
... lots more docs, then
]
  },
... we get to the highlighting portion of the response
... this tells me which values of each ResourceCorrespondent field
... actually matching the query

  "highlighting": {
"CCAAHG": {
  "ResourceCorrespondent": [
"Avery, Roy"
  ]
},
"CCGMDS": {
  "ResourceCorrespondent": [
"Avery, Roy"
  ]
},
"BBACKV": {
  "ResourceCorrespondent": [
"American Institute of Biological Sciences",
"Albritton, Errett C."
  ]
},
... lots more useful highlight values. Note two matching values
... for document BBACKV.
}

***
***
However, using exact same parameters with Solr v7.5.0 or v7.7.1, the top 
portion of the
response is basically the same including the number of documents found

{
  "responseHeader":{
"status":0,
"QTime":245,
"params":{
  "q":"ResourceCorrespondent:[A TO B}",
  "hl":"on",
  "hl.preserveMulti":"true",
  "fl":"ResourceID, ResourceCorrespondent",
  "hl.requireFieldMatch":"true",
  "hl.fl":"ResourceCorrespondent",
  "hightlightMultiTerm":"true",
  "wt":"json",
  "_":"1553105129887",
  "usePhraseHighLighter":"true"}},
  "response":{"numFound":999,"start":0,"docs":[

The documents are in a different order, but that doesn't matter.

The problem is with the lighlighting which is effectively empty. I don't know 
what
values in each document actually matched the query:

  "highlighting":{
"QQBBLX":{},
"QQBCLN":{},
"QQBCLM":{},
... etc.

*** NOTE: The data is the same for all Solr versions and the Solr indexes were 
rebuilt
for each Solr version.

***
Changing to using "=unified", the highlighting looks like:

  "highlighting":{
"QQBBLX":{
  "ResourceCorrespondent":[]},
"QQBCLN":{
  "ResourceCorrespondent":[]},
"QQBCLM":{
  "ResourceCorrespondent":[]},

*** Closer but still no useful values

***
NOTE: if I change only the query to be a wildcard query to 
q="ResourceCorrespondent:A*"

the highlighting is correct in both Solr v7.5.0 and v7.7.1:

  "highlighting":{
"QQBBLX":{
  "ResourceCorrespondent":["American Public Health Association"]},
"QQBCLN":{
  "ResourceCorrespondent":["Abram, Morris B."]},
"QQBCLM":{
  "ResourceCorrespondent":["Abram, Morris B."]},
... etc.

*** This makes me think there is some problem with a Range query feeding the
Highlighter code.

***
All variations of hl specs or other query parameters do not fix the problem.
The wildcard query is my current work around but there still is a problem with
range queries:

So there is some incompatibility among:

1) A multivalued string field AND
2) A range query against that field AND
3) Highlighting

The highlight portion of the response is effectively "empty"

I don't know when this issue was first introduced. I have recently been 
updating from 5.1.0
to 7.5.0 in one big leap. I have attempted to read through the change logs for 
the intervening
versions but I gave up to save my sanity.

--Karl


Highlighting Parent Documents

2018-12-09 Thread Nicolas Paris
Hi

I have read here [1] and here [2] that it is possible to highlight only
parent documents in block join queries. But I didn't succeed yet:

So here is my nested document example:
[
{
"id": "2",
"type_s": "parent",
"content_txt": ["apache"],
"_childDocuments_": [
{
"id": "1",
"type_s": "child",
"content_txt": ["solr"]
}
]
}
]

Here is my query (=give me document that have a parent which contain
"apache" term):
curl http://localhost:8983/solr/col/query -d '
fl=id
=on
=*
={!child of="type_s:parent"}type_s:parent AND content_txt:apache'

And here is the result:
{
...
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"1"}]
  },
  "highlighting":{
"1":{}}}


I was hoping to get this (=the doc 1 and highlight doc 2: the parent) :

{
...
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"1"}]
  },
  "highlighting":{
"2":{
  "content_txt":["apache"]}}}


[1] 
http://lucene.472066.n3.nabble.com/Fwd-Standard-highlighting-doesn-t-work-for-Block-Join-td4260784.html
[2] 
http://lucene.472066.n3.nabble.com/highlighting-on-child-document-td4238236.html


Thanks by advance,

-- 
nicolas


highlighting more-like-this

2018-10-12 Thread Matt Work Coarr
I want to get highlighted results for more like this queries.  More like
this doesn't support highlighting.

So what I did was ran a more like this query (I have the source document A
and say I get three similar documents back A1, A2, and A3).  I then create
a second query where I use the contents of A as the query.

More specifically, I have all a subset of my fields being appended to a
multivalued "catchall" field.  I use A's concatenated catchall (with
punctuation removed) as the search:

q=catchall:(*CONCATENATED_A_CATCHALL_TEXT*)

And I limit the results to the three documents A1/A2/A3 via qf:

qf=id*:A1_ID*+id*:A2_ID*+id*:A3_ID*

Now I get highlighted results.  But my main problem is very frequent terms
(for/the/to/in...) are highlighted.  I would have thought these would be
excluded via inverse document frequency (since they show up in just about
every document).

Is there a way to improve the highlighting? (Remove the less important
terms, set some threshold, etc)

Matt


highlighting in more like this?

2018-09-18 Thread Matt Work Coarr
Is it possible to get highlighting in more like this queries?  My initial
attempts seem to indicate that it isn't possible (I've only attempted this
via modifying MLT query urls)

(I'm looking for something similar to hl=true=field1,field5,field6 in
a normal search)

Thanks,
Matt


Re: Highlighting is not working with docValues only String field

2018-08-13 Thread Karthik Ramachandran
I have opened JIRA https://issues.apache.org/jira/browse/SOLR-12663


On Sat, Aug 11, 2018 at 8:59 PM Erick Erickson 
wrote:

> I can see why it wouldn't and also why it could/should. I also wonder about
> SortableTextField, perhaps mention that too.
>
> Seems worth a JIRA to me if there isn't one already
>
> On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran <
> kramachand...@commvault.com> wrote:
>
> > We are using Solr 7.2.1, highlighting is not working with docValues only
> > String field.
> >
> > Should I open a JIRA for this?
> >
> > Schema:
> > 
> >   id
> >   
> >> required="true"/>
> >> stored="true"/>
> >> stored="false"/>
> >   
> > 
> >
> > Data:
> > [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line
> > 2"},{"id":3,"name":"Testing line 3"}]
> >
> > Query:
> >
> >
> http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1
> >
> > Response:
> > {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing
> line
> > 1","name1":"Testing line 1"},{"id":"2","name":"Testing line
> > 2","name1":"Testing line 2"},{"id":"3","name":"Testing line
> > 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing
> > line 1"]},"2":{"name":["Testing line
> > 2"]},"3":{"name":["Testing line 3"]}}}
> >
> >
> > With Thanks & Regards
> > Karthik Ramachandran
> > P Please don't print this e-mail unless you really need to
> >
> > ***Legal Disclaimer***
> > "This communication may contain confidential and privileged material for
> > the
> > sole use of the intended recipient. Any unauthorized review, use or
> > distribution
> > by others is strictly prohibited. If you have received the message by
> > mistake,
> > please advise the sender by reply email and delete the message. Thank
> you."
> > **
> >
>


-- 
With Thanks & Regards
Karthik Ramachandran

P Please don't print this e-mail unless you really need to


Re: Highlighting is not working with docValues only String field

2018-08-11 Thread Erick Erickson
I can see why it wouldn't and also why it could/should. I also wonder about
SortableTextField, perhaps mention that too.

Seems worth a JIRA to me if there isn't one already

On Fri, Aug 10, 2018, 19:49 Karthik Ramachandran <
kramachand...@commvault.com> wrote:

> We are using Solr 7.2.1, highlighting is not working with docValues only
> String field.
>
> Should I open a JIRA for this?
>
> Schema:
> 
>   id
>   
>required="true"/>
>stored="true"/>
>stored="false"/>
>   
> 
>
> Data:
> [{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line
> 2"},{"id":3,"name":"Testing line 3"}]
>
> Query:
>
> http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1
>
> Response:
> {"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line
> 1","name1":"Testing line 1"},{"id":"2","name":"Testing line
> 2","name1":"Testing line 2"},{"id":"3","name":"Testing line
> 3","name1":"Testing line 3"}]},"highlighting":{"1":{"name":["Testing
> line 1"]},"2":{"name":["Testing line
> 2"]},"3":{"name":["Testing line 3"]}}}
>
>
> With Thanks & Regards
> Karthik Ramachandran
> P Please don't print this e-mail unless you really need to
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **
>


Highlighting is not working with docValues only String field

2018-08-10 Thread Karthik Ramachandran
We are using Solr 7.2.1, highlighting is not working with docValues only String 
field.

Should I open a JIRA for this?

Schema:

  id
  
  
  
  
  


Data:
[{"id":1,"name":"Testing line 1"},{"id":2,"name":"Testing line 
2"},{"id":3,"name":"Testing line 3"}]

Query:
http://localhost:8983/solr/test/select?q=Testing*=name=true=name,name1

Response:
{"response":{"numFound":3,"start":0,"docs":[{"id":"1","name":"Testing line 
1","name1":"Testing line 1"},{"id":"2","name":"Testing line 2","name1":"Testing 
line 2"},{"id":"3","name":"Testing line 3","name1":"Testing line 
3"}]},"highlighting":{"1":{"name":["Testing line 
1"]},"2":{"name":["Testing line 2"]},"3":{"name":["Testing 
line 3"]}}}


With Thanks & Regards
Karthik Ramachandran
P Please don't print this e-mail unless you really need to

***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**


Re: Highlighting the search keywords

2018-08-07 Thread Renuka Srishti
First of all thanks to all, its a great community and amazing experience to
work with Apache Solr.

I was trying to use highlight component inside suggest request handler by
mentioning highlight component like this:

 suggest
highlight


So I can use suggestions and highlighter at the same time, but its not
working. Am I missing something?


Thanks
Renuka Srishti

On Wed 1 Aug, 2018, 12:05 Nicolas Franck,  wrote:

> Nope, that is how it works. It is not in place.
>
> > On 31 Jul 2018, at 21:57, Renuka Srishti 
> wrote:
> >
> > Hi All,
> >
> > I was using highlighting in solr, solr gives highlighting results within
> > the response but not included within the documents.
> > Am i missing something? Can i configure so that it can show highlighted
> > keywords matched within the documents.
> >
> > Thanks
> > Renuka Srishti
>
>


Re: Highlighting the search keywords

2018-08-01 Thread Nicolas Franck
Nope, that is how it works. It is not in place.

> On 31 Jul 2018, at 21:57, Renuka Srishti  wrote:
> 
> Hi All,
> 
> I was using highlighting in solr, solr gives highlighting results within
> the response but not included within the documents.
> Am i missing something? Can i configure so that it can show highlighted
> keywords matched within the documents.
> 
> Thanks
> Renuka Srishti



Highlighting the search keywords

2018-07-31 Thread Renuka Srishti
Hi All,

I was using highlighting in solr, solr gives highlighting results within
the response but not included within the documents.
Am i missing something? Can i configure so that it can show highlighted
keywords matched within the documents.

Thanks
Renuka Srishti


Hit highlighting

2018-07-06 Thread Dwane Hall
Good evening solr community.  I have not had a lot of luck on another community 
source seeking advice on using the unified highlighter so I thought I'd try my 
luck with the solr experts.  Any recommendations would be appreciated when you 
get time.


Apache Solr 6.4 saw the release of the unified highlighter which according to 
the Solr documentation is the "most flexible and performant of the options. We 
recommend that you try this highlighter even though it isn’t the default 
(yet)". 
(ref)

With this in mind I've attempted to follow this recommendation and utilize it 
in a project I'm designing. The functionality works as expected but I am unable 
to find a hl.requireFieldMatch equivalent for this highlighter. This means the 
entire hl.fl list is returned as empty arrays for all fields that do not have a 
hit highlight associated with them (along with the successful highlight 
fields). These fields can be ignored by a client but ideally they would not be 
passed to a calling client as the list can be quite long especially if a 
wildcard (*) is used in the hl.fl parameter. With this in mind would I be 
better off continuing with the unified highlighter and ignoring the additional 
non-highlighted field list or defaulting back to the fastVector highlighter? 
How significant performance improvement does the unified highlighter offer and 
am I better off wearing the additional network data overhead to leverage this 
performance gain? For reference the index is a large one (400,000,000+ 
documents on Solr 7.3.1) so my initial instinct is to keep using the unified 
highlighter and remove as much stress on my Solr cluster as possible. Any 
advice or recommendations would be greatly appreciated.


Thanks.


DH



Re: querying vs. highlighting: complete freedom?

2018-04-03 Thread Arturas Mazeika
Hi David,

Thanks a lot for the reply and the infos.

I suspected that the minimum on the indexing/storage side was that hl.fl
need to be "stored". I understand that my expression "minimal requirements"
are totally loose/unclear, I wasn't sure how to formulate that as (i) I am
not yet sure how to express myself clearly using the language of the forum
and (ii) I was not sure what impact it has if other component is selected
(like FastVector Highlighter). Deep inside I had a feeling that some solr
configurations would allow highlighting even without the "stored" property
set.

It came to my mind that the document nicely describes how to setup the
parameter hl.method (unified, original, fastVector). Similarly, there's the
hl.qparser parameter, but the documentation of that parameter is not as
rich (the documentation says, that the default value is lucene). I am
wondering are there other alternatives available? In case you are referring
to other components, can you add a reference to those?

With respect to your question, why I'd like to use the analysis-chain for
highlighting. That is a very good question: our end users cannot yet
distinguish between highlighting capability of solr/information retrieval
and search of the occurrences of the query terms in the documents. It is a
rather difficult situation I am in. It is cool that there's a JIRA or two
on the the load-balancing side.

Thanks!
Arturas

On Tue, Apr 3, 2018 at 4:29 PM, David Smiley <david.w.smi...@gmail.com>
wrote:

> Thanks for your review!
>
> On Tue, Apr 3, 2018 at 6:56 AM Arturas Mazeika <maze...@gmail.com> wrote:
> ...
>
> > What I missed at the beginning of the documentation is the minimal set of
> > requirements that is reacquired to have highlighting sensible: somehow I
> > have a feeling that one needs some of the information stored in schema in
> > some form. This of course is mentioned later on in the corresponding
> > section, but I'd write this explicitly.
> >
>
> Explicitly say what up front?  "Requirements" are somewhat loose/minimal.
> We ought to say clearly say that hl.fl fields need to be "stored".
>
> ...
>
> > Is there a way to "load-balance" analyze-query-chain for the purpose of
> > highlighting matches? In the url below, I need to specify a specific
> core.
>
> ...
>
> I doubt it.  You'll have to do this yourself.  Why do you want to use this
> for highlighting?  Is it to get the offsets returned to you?  There's a
> JIRA or two for that already; someone ought to make that happen.
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>


Re: querying vs. highlighting: complete freedom?

2018-04-03 Thread David Smiley
Thanks for your review!

On Tue, Apr 3, 2018 at 6:56 AM Arturas Mazeika <maze...@gmail.com> wrote:
...

> What I missed at the beginning of the documentation is the minimal set of
> requirements that is reacquired to have highlighting sensible: somehow I
> have a feeling that one needs some of the information stored in schema in
> some form. This of course is mentioned later on in the corresponding
> section, but I'd write this explicitly.
>

Explicitly say what up front?  "Requirements" are somewhat loose/minimal.
We ought to say clearly say that hl.fl fields need to be "stored".

...

> Is there a way to "load-balance" analyze-query-chain for the purpose of
> highlighting matches? In the url below, I need to specify a specific core.

...

I doubt it.  You'll have to do this yourself.  Why do you want to use this
for highlighting?  Is it to get the offsets returned to you?  There's a
JIRA or two for that already; someone ought to make that happen.
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: querying vs. highlighting: complete freedom?

2018-04-03 Thread Arturas Mazeika
Hi David,

Thanks a lot for the reply, the effort to update the documentation, and
have the documentation reflect the question I posted here.

I've read the doc you provided. I've read the updated parts and the the
document as carefully as I could. I've browsed and skimmed part of the
document (where it got rather detailed, especially the parts from the
original, unified and vector highlighters. I'll have to revisit those parts
as I deepen my understanding about information retrieval and solr in
particular.

The updates in the document are helpful and improved the document quite a
bit. I also agree that it is hard to document the problem and give a
solution to my problem. I see at least two reasons why this becomes very
challenging in this case: (i) the document aims to cover all options and
possibilities of highlighting in solr, (ii) the documents aims to teach the
reader how to use highlighting in solr. These aims are conflicting: If one
wants to cover the options and possibilities, one structures the content
hierarchically, starting with most basic building blocks (jumping into
details first). If one aims at usage, one starts with the simplest possible
case that illustrates highlighting, followed up by more complex use cases
illustrating more sophisticated and advanced cases (abstracts from details,
focuses on big picture). 1st type of documentation tends to be long and
boring (check out manuals provided by Microsoft, they perfected this style
of documenting in my opinion) second type of documentation repeats itself
constantly, or contains multiple references to outside (as every new use
case is somewhat based on the previous one). You have sections that focus
on both aspects in the documentation: some examples give very simple
targeted examples how to use solr, and some sections dig into the details.
What I missed at the beginning of the documentation is the minimal set of
requirements that is reacquired to have highlighting sensible: somehow I
have a feeling that one needs some of the information stored in schema in
some form. This of course is mentioned later on in the corresponding
section, but I'd write this explicitly.

I still have a question that would really be cool to get an answer (which
is more about analyses and less about highlighting). My key question is:

Is there a way to "load-balance" analyze-query-chain for the purpose of
highlighting matches? In the url below, I need to specify a specific core.

http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml;
analysis.showmatch=true=Albert%20Einstein%20(14%20March%
201879%20%E2%80%93%2018%20April%201955)%20was%20a%
20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%
20of%20relativity,%20one%20of%20the%20two%20pillars%20of%
20modern%20physics%20(alongside%20quantum%20mechanics).=
reletivity%20theory=text_en


The context for this question is:

> Steven hint pushed me into this direction further: he suggested to use the
> query part of solr to filter and sort out the relevant answers in the 1st
> step and in the 2nd step he'd highlight all the keywords using CTR+F (in
> the browser or some alternative viewer). This brought be to the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to achieve
> this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the query
> 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
> document and the query
> 3. Use the matching of the substrings from the original text to last
> filter/tokenizer/analyzer in the analyze-chain to map the terms of the
> query
> 4. Emulate CTRL+F highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal. If
> one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was
a
> German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside quantum
> mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml;
analysis.showmatch=true=Albert%20Einstein%20(14%20March%
201879%20%E2%80%93%2018%20April%201955)%20was%20a%
20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%
20of%20relativity,%20one%20of%20the%20two%20pillars%20of%
20modern%20physics%20(alongside%20quantum%20mechanics).=
reletivity%20theory=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core that
> receives the request is not necessarily the one that processes it? Or this
> alread

Re: querying vs. highlighting: complete freedom?

2018-04-02 Thread David Smiley
Hi Arturas,

Both Erick and I had a go at improving the documentation here.  I hope it's
clearer.
https://builds.apache.org/job/Solr-reference-guide-master/javadoc/highlighting.html
The docs for hl.fl, hl.q, hl.qparser were all updated.  The meat of the
change was a new note in hl.fl including an example.  It's kinda hard to
document the problem you found but I hope the note will be somewhat
illustrative.

~ David

On Mon, Mar 26, 2018 at 3:12 AM Arturas Mazeika <maze...@gmail.com> wrote:

> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what I
> mean when I supply a keyword without the field-qualifier. Very impressive.
> Would you care (re)posting this answer to stackoverflow? If that is too
> much of a hassle, I'll do this in a couple of days myself on your behalf.
>
> I am impressed how well, thorough, fast and fully the question was
> answered.
>
> Steven hint pushed me into this direction further: he suggested to use the
> query part of solr to filter and sort out the relevant answers in the 1st
> step and in the 2nd step he'd highlight all the keywords using CTR+F (in
> the browser or some alternative viewer). This brought be to the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to achieve
> this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the query
> 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
> document and the query
> 3. Use the matching of the substrings from the original text to last
> filter/tokenizer/analyzer in the analyze-chain to map the terms of the
> query
> 4. Emulate CTRL+F highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal. If
> one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
> German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside quantum
> mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml=true=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).=reletivity%20theory=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core that
> receives the request is not necessarily the one that processes it? Or this
> already is distributed in a sense that receiving core and processing cores
> are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store this
> information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Arturas:
> >
> > Try to field-qualify your hl.q parameter. That looks like:
> >
> > hl.q=trans:Kundigung
> > or
> > hl.q=trans:Kündigung
> >
> > I saw the exact behavior you describe when I did _not_ specify the
> > field in the hl.q parameter, i.e.
> >
> > hl.q=Kundigung
> > or
> > hl.q=Kündigung
> >
> > didn't show all highlights.
> >
> > But when I did specify the field, it worked.
> >
> > Here's what I think is happening: Solr uses the default search
> > field when parsing an un-field-qualified query. I.e.
> >
> > q=something
> >
> > is parsed as
> >
> > q=default_search_field:something.
> >
> > The default field is controlled in solrconfig.xml with the "df"
> > parameter, you'll see entries like:
> > my_field
> >
> > Also when I changed the "df" parameter to the field I was highlighting
> > on, I didn't need to specify the field on the hl.q parameter.
> >
> > hl.q=Kundigung
> > or
> > hl.q=Kündigung
> >
> > The default  field is usually "text", which knows nothing about
> > the German-specific filters you've applied unless you changed it.
> >
> > So in the absence of a field-qualification for the hl.q parameter Solr
> > was parsing the query according to the ana

Re: querying vs. highlighting: complete freedom?

2018-03-26 Thread Erick Erickson
Arturas:

Thanks for the "atta boy's", but I have to confess I poked a
developer's list and the person (David Smiley) who, you know, like
understands the highlighting code replied, and I passed it on ;

I have great respect for the SO forum, but don't post to it since
there's only so much time in a day, so please feel free to put that
explanation over there.

As for the rest, I'll have to pass today, the aforementioned time
constraints are calling

Best,
Erick

On Mon, Mar 26, 2018 at 12:12 AM, Arturas Mazeika <maze...@gmail.com> wrote:
> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what I
> mean when I supply a keyword without the field-qualifier. Very impressive.
> Would you care (re)posting this answer to stackoverflow? If that is too
> much of a hassle, I'll do this in a couple of days myself on your behalf.
>
> I am impressed how well, thorough, fast and fully the question was answered.
>
> Steven hint pushed me into this direction further: he suggested to use the
> query part of solr to filter and sort out the relevant answers in the 1st
> step and in the 2nd step he'd highlight all the keywords using CTR+F (in
> the browser or some alternative viewer). This brought be to the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to achieve
> this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the query
> 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
> document and the query
> 3. Use the matching of the substrings from the original text to last
> filter/tokenizer/analyzer in the analyze-chain to map the terms of the query
> 4. Emulate CTRL+F highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal. If
> one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
> German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside quantum
> mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml=true=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).=reletivity%20theory=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core that
> receives the request is not necessarily the one that processes it? Or this
> already is distributed in a sense that receiving core and processing cores
> are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store this
> information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Arturas:
>>
>> Try to field-qualify your hl.q parameter. That looks like:
>>
>> hl.q=trans:Kundigung
>> or
>> hl.q=trans:Kündigung
>>
>> I saw the exact behavior you describe when I did _not_ specify the
>> field in the hl.q parameter, i.e.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> didn't show all highlights.
>>
>> But when I did specify the field, it worked.
>>
>> Here's what I think is happening: Solr uses the default search
>> field when parsing an un-field-qualified query. I.e.
>>
>> q=something
>>
>> is parsed as
>>
>> q=default_search_field:something.
>>
>> The default field is controlled in solrconfig.xml with the "df"
>> parameter, you'll see entries like:
>> my_field
>>
>> Also when I changed the "df" parameter to the field I was highlighting
>> on, I didn't need to specify the field on the hl.q parameter.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> The default  field is usually "text", which knows nothing about
>> the German-specific filters you've applied unless you changed it.
>>
>> So in the absence of a field-qualification for the hl.q parameter Solr
>> was parsing the query according 

Re: querying vs. highlighting: complete freedom?

2018-03-26 Thread Arturas Mazeika
Hi Erick,

Adding a field-qualify to the hl.q parameter solved the issue. My
excitement is steaming over the roof! What a thorough answer: the
explanation about the behavior of solr, how it tries to interpret what I
mean when I supply a keyword without the field-qualifier. Very impressive.
Would you care (re)posting this answer to stackoverflow? If that is too
much of a hassle, I'll do this in a couple of days myself on your behalf.

I am impressed how well, thorough, fast and fully the question was answered.

Steven hint pushed me into this direction further: he suggested to use the
query part of solr to filter and sort out the relevant answers in the 1st
step and in the 2nd step he'd highlight all the keywords using CTR+F (in
the browser or some alternative viewer). This brought be to the next
question:

How can one match query terms with the analyze-chained documents in an
efficient and distributed manner? My current understanding how to achieve
this is the following:

1. Get the list of ids (contents) of the documents that match the query
2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
document and the query
3. Use the matching of the substrings from the original text to last
filter/tokenizer/analyzer in the analyze-chain to map the terms of the query
4. Emulate CTRL+F highlighting

Web Interface of Solr offers quite a bit to advance towards this goal. If
one fires this request:

* analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
German-born theoretical physicist[5] who developed the theory of
relativity, one of the two pillars of modern physics (alongside quantum
mechanics).&
* analysis.query=reletivity theory

to one of the cores of solr, one gets the steps 1-3 done:

http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml=true=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).=reletivity%20theory=text_en

Questions:

1. Is there a way to "load-balance" this? In the above url, I need to
specify a specific core. Is it possible to generalize it, so the core that
receives the request is not necessarily the one that processes it? Or this
already is distributed in a sense that receiving core and processing cores
are never the same?

2. The document was already analyze-chained. Is is possible to store this
information so one does not need to re-analyze-chain it once more?

Cheers
Arturas

On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Arturas:
>
> Try to field-qualify your hl.q parameter. That looks like:
>
> hl.q=trans:Kundigung
> or
> hl.q=trans:Kündigung
>
> I saw the exact behavior you describe when I did _not_ specify the
> field in the hl.q parameter, i.e.
>
> hl.q=Kundigung
> or
> hl.q=Kündigung
>
> didn't show all highlights.
>
> But when I did specify the field, it worked.
>
> Here's what I think is happening: Solr uses the default search
> field when parsing an un-field-qualified query. I.e.
>
> q=something
>
> is parsed as
>
> q=default_search_field:something.
>
> The default field is controlled in solrconfig.xml with the "df"
> parameter, you'll see entries like:
> my_field
>
> Also when I changed the "df" parameter to the field I was highlighting
> on, I didn't need to specify the field on the hl.q parameter.
>
> hl.q=Kundigung
> or
> hl.q=Kündigung
>
> The default  field is usually "text", which knows nothing about
> the German-specific filters you've applied unless you changed it.
>
> So in the absence of a field-qualification for the hl.q parameter Solr
> was parsing the query according to the analysis chain specifed
> in your default field, and probably passed ü through without
> transforming it. Since your indexing analysis chain for that field
> folded ü to just plain u, it wasn't found or highlighted.
>
> On the surface, this does seem like something that should be
> changed, I'll go ahead and ping the dev list.
>
> NOTE: I was trying this on Solr 7.1
>
> Best,
> Erick
>
> On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika <maze...@gmail.com>
> wrote:
> > Hi Erick,
> >
> > Thanks for the update and the infos. Your post brought quite a bit of
> light
> > into the picture and now I understand quite a bit more about what you are
> > saying. Your explanation makes sense and can be quite useful in certain
> > scenarious.
> >
> > What stroke me from your description is that you are saying that the
> > analyzer-chain needs to be applied for the highlighting queries as well.
> > The

Re: querying vs. highlighting: complete freedom?

2018-03-23 Thread Erick Erickson
Arturas:

Try to field-qualify your hl.q parameter. That looks like:

hl.q=trans:Kundigung
or
hl.q=trans:Kündigung

I saw the exact behavior you describe when I did _not_ specify the
field in the hl.q parameter, i.e.

hl.q=Kundigung
or
hl.q=Kündigung

didn't show all highlights.

But when I did specify the field, it worked.

Here's what I think is happening: Solr uses the default search
field when parsing an un-field-qualified query. I.e.

q=something

is parsed as

q=default_search_field:something.

The default field is controlled in solrconfig.xml with the "df"
parameter, you'll see entries like:
my_field

Also when I changed the "df" parameter to the field I was highlighting
on, I didn't need to specify the field on the hl.q parameter.

hl.q=Kundigung
or
hl.q=Kündigung

The default  field is usually "text", which knows nothing about
the German-specific filters you've applied unless you changed it.

So in the absence of a field-qualification for the hl.q parameter Solr
was parsing the query according to the analysis chain specifed
in your default field, and probably passed ü through without
transforming it. Since your indexing analysis chain for that field
folded ü to just plain u, it wasn't found or highlighted.

On the surface, this does seem like something that should be
changed, I'll go ahead and ping the dev list.

NOTE: I was trying this on Solr 7.1

Best,
Erick

On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika <maze...@gmail.com> wrote:
> Hi Erick,
>
> Thanks for the update and the infos. Your post brought quite a bit of light
> into the picture and now I understand quite a bit more about what you are
> saying. Your explanation makes sense and can be quite useful in certain
> scenarious.
>
> What stroke me from your description is that you are saying that the
> analyzer-chain needs to be applied for the highlighting queries as well.
> The tragedy is that I am not able to get this for a german collection: if
> the query is set (no explicit highlighting query), the highlighting is
> correct. It is also correct, if I replace the umaults into the
> corresponding latin chars. Getting the analyzer chain for the highlighting
> terms remains the challenge.
>
> Do you think you have a look at the following stakoverflow link? Maybe
> something comes to your mind...
>
> *https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted
> <https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted>*
>
> *Cheers,*
>
> *Arturas*
> On Fri, Mar 23, 2018, 17:43 Erick Erickson <erickerick...@gmail.com> wrote:
>
>> bq: this is not a typical case that one searches for a keyword but
>> highlights something else
>>
>> This isn't really an unusual case, apparently I mislead you.
>>
>> What I was trying to convey is that the analysis chain used is firmly
>> attached to a particular _field_. There's no way to say "use one
>> analysis chain for the query and another for highlighting on the
>> _same_ field".
>>
>> You can use two different fields with different analysis chains, one
>> for each purpose. So something like
>>
>> q=f1:something=f2,f3=other
>>
>> is certainly reasonable. It'll search for "something" in f1, and
>> highlight "other" in f2 and f3
>>
>> Each fields processes its input with the analysis chain defined in the
>> schema.
>>
>> The rest about stored="true" can be ignored, it's just me wandering
>> off into the weeds about an optimization that only stores the data
>> once rather than redundantly in multiple fields.
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika <maze...@gmail.com>
>> wrote:
>> > Hi Mathesis (Stefan),
>> >
>> > Thanks for the questions. This made me look at the problem from a
>> distance
>> > and re-frame the situation. Good questions indeed.
>> >
>> > Trying to go around: consider a user who describes herself as being a BMW
>> > fan, being convinced that all BMW need to be the blackest color possible
>> > (for a sake of argument) who would like to search and later browse the
>> > entries in the discussion forum (of course not everything but BMW of the
>> > blackest color), and what interest her are the snippets that have
>> > understood, craziest as keywords or the like (because she is looking for
>> a
>> > dozen of discussions that she saw before).
>> >
>> > What I was not able to achieve so far is: (i) combine query term for
>> > filtering and highlighting, (ii) using t

Re: querying vs. highlighting: complete freedom?

2018-03-23 Thread Arturas Mazeika
Hi Erick,

Thanks for the update and the infos. Your post brought quite a bit of light
into the picture and now I understand quite a bit more about what you are
saying. Your explanation makes sense and can be quite useful in certain
scenarious.

What stroke me from your description is that you are saying that the
analyzer-chain needs to be applied for the highlighting queries as well.
The tragedy is that I am not able to get this for a german collection: if
the query is set (no explicit highlighting query), the highlighting is
correct. It is also correct, if I replace the umaults into the
corresponding latin chars. Getting the analyzer chain for the highlighting
terms remains the challenge.

Do you think you have a look at the following stakoverflow link? Maybe
something comes to your mind...

*https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted
<https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted>*

*Cheers,*

*Arturas*
On Fri, Mar 23, 2018, 17:43 Erick Erickson <erickerick...@gmail.com> wrote:

> bq: this is not a typical case that one searches for a keyword but
> highlights something else
>
> This isn't really an unusual case, apparently I mislead you.
>
> What I was trying to convey is that the analysis chain used is firmly
> attached to a particular _field_. There's no way to say "use one
> analysis chain for the query and another for highlighting on the
> _same_ field".
>
> You can use two different fields with different analysis chains, one
> for each purpose. So something like
>
> q=f1:something=f2,f3=other
>
> is certainly reasonable. It'll search for "something" in f1, and
> highlight "other" in f2 and f3
>
> Each fields processes its input with the analysis chain defined in the
> schema.
>
> The rest about stored="true" can be ignored, it's just me wandering
> off into the weeds about an optimization that only stores the data
> once rather than redundantly in multiple fields.
>
> Best,
> Erick
>
> On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika <maze...@gmail.com>
> wrote:
> > Hi Mathesis (Stefan),
> >
> > Thanks for the questions. This made me look at the problem from a
> distance
> > and re-frame the situation. Good questions indeed.
> >
> > Trying to go around: consider a user who describes herself as being a BMW
> > fan, being convinced that all BMW need to be the blackest color possible
> > (for a sake of argument) who would like to search and later browse the
> > entries in the discussion forum (of course not everything but BMW of the
> > blackest color), and what interest her are the snippets that have
> > understood, craziest as keywords or the like (because she is looking for
> a
> > dozen of discussions that she saw before).
> >
> > What I was not able to achieve so far is: (i) combine query term for
> > filtering and highlighting, (ii) using the analyzer-chain from the
> > attribute to rewrite the highlight query (or define one in the search)
> >
> > CTR+F technique is a very powerful one, indeed. Works most of the time.
> The
> > difficulties with it are query rewriting, enriching, etc.
> >
> > Cheers,
> > Arturas
> >
> > On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis <
> matheis.ste...@gmail.com>
> > wrote:
> >
> >> Perhaps we try it the other way round .. what's your use case for this?
> I'm
> >> trying to think of a situation where I'd need this a as user?
> >>
> >> The only reason I see myself doing this is CTRL+F in a page when the
> search
> >> result is not  immediately visible for me ;)
> >>
> >> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <maze...@gmail.com> wrote:
> >>
> >> > Hi Erick et al,
> >> >
> >> > From your answer I understand that this is not a typical case that one
> >> > searches for a keyword but highlights something else. Since we have
> two
> >> > parameters (q vs hl.q) I thought they are freely combinable. From your
> >> > answer I understand that this is not really the case. My current
> >> > understanding came from [1] that says:
> >> >
> >> > hl.q
> >> >
> >> > A query to use for highlighting. This parameter allows you to
> highlight
> >> > different terms than those being used to retrieve documents.
> >> > what I hear from you is something different: i.e., that this is not
> >> enough
> >> > just to combine the q with hl.q, that there are caveats to achieve

Re: querying vs. highlighting: complete freedom?

2018-03-23 Thread Erick Erickson
bq: this is not a typical case that one searches for a keyword but
highlights something else

This isn't really an unusual case, apparently I mislead you.

What I was trying to convey is that the analysis chain used is firmly
attached to a particular _field_. There's no way to say "use one
analysis chain for the query and another for highlighting on the
_same_ field".

You can use two different fields with different analysis chains, one
for each purpose. So something like

q=f1:something=f2,f3=other

is certainly reasonable. It'll search for "something" in f1, and
highlight "other" in f2 and f3

Each fields processes its input with the analysis chain defined in the schema.

The rest about stored="true" can be ignored, it's just me wandering
off into the weeds about an optimization that only stores the data
once rather than redundantly in multiple fields.

Best,
Erick

On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika <maze...@gmail.com> wrote:
> Hi Mathesis (Stefan),
>
> Thanks for the questions. This made me look at the problem from a distance
> and re-frame the situation. Good questions indeed.
>
> Trying to go around: consider a user who describes herself as being a BMW
> fan, being convinced that all BMW need to be the blackest color possible
> (for a sake of argument) who would like to search and later browse the
> entries in the discussion forum (of course not everything but BMW of the
> blackest color), and what interest her are the snippets that have
> understood, craziest as keywords or the like (because she is looking for a
> dozen of discussions that she saw before).
>
> What I was not able to achieve so far is: (i) combine query term for
> filtering and highlighting, (ii) using the analyzer-chain from the
> attribute to rewrite the highlight query (or define one in the search)
>
> CTR+F technique is a very powerful one, indeed. Works most of the time. The
> difficulties with it are query rewriting, enriching, etc.
>
> Cheers,
> Arturas
>
> On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis <matheis.ste...@gmail.com>
> wrote:
>
>> Perhaps we try it the other way round .. what's your use case for this? I'm
>> trying to think of a situation where I'd need this a as user?
>>
>> The only reason I see myself doing this is CTRL+F in a page when the search
>> result is not  immediately visible for me ;)
>>
>> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <maze...@gmail.com> wrote:
>>
>> > Hi Erick et al,
>> >
>> > From your answer I understand that this is not a typical case that one
>> > searches for a keyword but highlights something else. Since we have two
>> > parameters (q vs hl.q) I thought they are freely combinable. From your
>> > answer I understand that this is not really the case. My current
>> > understanding came from [1] that says:
>> >
>> > hl.q
>> >
>> > A query to use for highlighting. This parameter allows you to highlight
>> > different terms than those being used to retrieve documents.
>> > what I hear from you is something different: i.e., that this is not
>> enough
>> > just to combine the q with hl.q, that there are caveats to achieve the
>> task
>> > (multiple fields, FastVectorHighlighter).
>> >
>> > Your infos are very helpful.
>> >
>> > Cheers,
>> > Arturas
>> >
>> > [1]  https://lucene.apache.org/solr/guide/7_2/highlighting.html
>> >
>> > On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <erickerick...@gmail.com
>> >
>> > wrote:
>> >
>> > > Basically you need to use a copyField, but in several variants:
>> > >
>> > > If you use the field _exclusively_ for highlighting then store the raw
>> > > content there and have the field use whatever analyzer you want. You
>> > > do _not_ need to have indexed="true" set for the field if you're
>> > > highlighting on the fly. So you're searching against field1 (which has
>> > > indexed="true" stored="false" set) but highlighting against field2
>> > > (which has indexed="false" stored="true" set). Of course any time you
>> > > want to return the contents in a doc your fl needs to specify
>> > > field2...
>> > >
>> > > The above does not bloat your index at all since the cost of
>> > > stored="true" indexed="true" is the same as if you use two fields,
>> > > each with only one option turned on.
>> > >
>> > > The second approach if you w

Re: querying vs. highlighting: complete freedom?

2018-03-23 Thread Arturas Mazeika
Hi Mathesis (Stefan),

Thanks for the questions. This made me look at the problem from a distance
and re-frame the situation. Good questions indeed.

Trying to go around: consider a user who describes herself as being a BMW
fan, being convinced that all BMW need to be the blackest color possible
(for a sake of argument) who would like to search and later browse the
entries in the discussion forum (of course not everything but BMW of the
blackest color), and what interest her are the snippets that have
understood, craziest as keywords or the like (because she is looking for a
dozen of discussions that she saw before).

What I was not able to achieve so far is: (i) combine query term for
filtering and highlighting, (ii) using the analyzer-chain from the
attribute to rewrite the highlight query (or define one in the search)

CTR+F technique is a very powerful one, indeed. Works most of the time. The
difficulties with it are query rewriting, enriching, etc.

Cheers,
Arturas

On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis <matheis.ste...@gmail.com>
wrote:

> Perhaps we try it the other way round .. what's your use case for this? I'm
> trying to think of a situation where I'd need this a as user?
>
> The only reason I see myself doing this is CTRL+F in a page when the search
> result is not  immediately visible for me ;)
>
> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <maze...@gmail.com> wrote:
>
> > Hi Erick et al,
> >
> > From your answer I understand that this is not a typical case that one
> > searches for a keyword but highlights something else. Since we have two
> > parameters (q vs hl.q) I thought they are freely combinable. From your
> > answer I understand that this is not really the case. My current
> > understanding came from [1] that says:
> >
> > hl.q
> >
> > A query to use for highlighting. This parameter allows you to highlight
> > different terms than those being used to retrieve documents.
> > what I hear from you is something different: i.e., that this is not
> enough
> > just to combine the q with hl.q, that there are caveats to achieve the
> task
> > (multiple fields, FastVectorHighlighter).
> >
> > Your infos are very helpful.
> >
> > Cheers,
> > Arturas
> >
> > [1]  https://lucene.apache.org/solr/guide/7_2/highlighting.html
> >
> > On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> > > Basically you need to use a copyField, but in several variants:
> > >
> > > If you use the field _exclusively_ for highlighting then store the raw
> > > content there and have the field use whatever analyzer you want. You
> > > do _not_ need to have indexed="true" set for the field if you're
> > > highlighting on the fly. So you're searching against field1 (which has
> > > indexed="true" stored="false" set) but highlighting against field2
> > > (which has indexed="false" stored="true" set). Of course any time you
> > > want to return the contents in a doc your fl needs to specify
> > > field2...
> > >
> > > The above does not bloat your index at all since the cost of
> > > stored="true" indexed="true" is the same as if you use two fields,
> > > each with only one option turned on.
> > >
> > > The second approach if you want to use FastVectorHighlighter or the
> > > like is simply to index both fields.
> > >
> > > Best,
> > > Erick
> > >
> > > On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <maze...@gmail.com>
> > > wrote:
> > > > Hi Solr-Users,
> > > >
> > > > I've been playing with a german collection of documents, where I
> tried
> > to
> > > > search for one word (q=Tag) and highlighted another:
> (hl.q=Kundigung).
> > Is
> > > > this a "legal" use case? My key question is how can I tell solr which
> > > query
> > > > analyzer to use for highlighting? Strictly speaking, I should use
> > > > hl.q=Kündigung to conceptually look for relevant information, but in
> > this
> > > > case, no highlighting is returned (as all umlauts are left out in the
> > > > index) .
> > > >
> > > > Additional infos:
> > > >
> > > > solr version: 7.2
> > > > urls to query:
> > > >
> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit=
> > > true=trans=Kundigung=3=xml=1
> > > >
> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit=
> > > true=trans=K%C3%BCndigung=3=xml=1
> > > > <http://localhost:8983/solr/trans/select?q=trans:Zeit=
> > > true=trans=Kundigung=3=xml=1>
> > > >
> > > > Managed-schema:
> > > >
> > > >> > positionIncrementGap="100">
> > > > 
> > > >   
> > > >   
> > > >> > > words="lang/stopwords_de.txt" ignoreCase="true"/>
> > > >   
> > > >   
> > > > 
> > > >   
> > > >
> > > >
> > > > Other additional infos:
> > > > https://stackoverflow.com/questions/49276093/solr-
> > > highlighting-terms-with-umlaut-not-found-not-highlighted
> > > >
> > > > Cheers,
> > > > Arturas
> > >
> >
>


Re: querying vs. highlighting: complete freedom?

2018-03-23 Thread Stefan Matheis
Perhaps we try it the other way round .. what's your use case for this? I'm
trying to think of a situation where I'd need this a as user?

The only reason I see myself doing this is CTRL+F in a page when the search
result is not  immediately visible for me ;)

On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <maze...@gmail.com> wrote:

> Hi Erick et al,
>
> From your answer I understand that this is not a typical case that one
> searches for a keyword but highlights something else. Since we have two
> parameters (q vs hl.q) I thought they are freely combinable. From your
> answer I understand that this is not really the case. My current
> understanding came from [1] that says:
>
> hl.q
>
> A query to use for highlighting. This parameter allows you to highlight
> different terms than those being used to retrieve documents.
> what I hear from you is something different: i.e., that this is not enough
> just to combine the q with hl.q, that there are caveats to achieve the task
> (multiple fields, FastVectorHighlighter).
>
> Your infos are very helpful.
>
> Cheers,
> Arturas
>
> [1]  https://lucene.apache.org/solr/guide/7_2/highlighting.html
>
> On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Basically you need to use a copyField, but in several variants:
> >
> > If you use the field _exclusively_ for highlighting then store the raw
> > content there and have the field use whatever analyzer you want. You
> > do _not_ need to have indexed="true" set for the field if you're
> > highlighting on the fly. So you're searching against field1 (which has
> > indexed="true" stored="false" set) but highlighting against field2
> > (which has indexed="false" stored="true" set). Of course any time you
> > want to return the contents in a doc your fl needs to specify
> > field2...
> >
> > The above does not bloat your index at all since the cost of
> > stored="true" indexed="true" is the same as if you use two fields,
> > each with only one option turned on.
> >
> > The second approach if you want to use FastVectorHighlighter or the
> > like is simply to index both fields.
> >
> > Best,
> > Erick
> >
> > On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <maze...@gmail.com>
> > wrote:
> > > Hi Solr-Users,
> > >
> > > I've been playing with a german collection of documents, where I tried
> to
> > > search for one word (q=Tag) and highlighted another: (hl.q=Kundigung).
> Is
> > > this a "legal" use case? My key question is how can I tell solr which
> > query
> > > analyzer to use for highlighting? Strictly speaking, I should use
> > > hl.q=Kündigung to conceptually look for relevant information, but in
> this
> > > case, no highlighting is returned (as all umlauts are left out in the
> > > index) .
> > >
> > > Additional infos:
> > >
> > > solr version: 7.2
> > > urls to query:
> > >
> > > http://localhost:8983/solr/trans/select?q=trans:Zeit=
> > true=trans=Kundigung=3=xml=1
> > >
> > > http://localhost:8983/solr/trans/select?q=trans:Zeit=
> > true=trans=K%C3%BCndigung=3=xml=1
> > > <http://localhost:8983/solr/trans/select?q=trans:Zeit=
> > true=trans=Kundigung=3=xml=1>
> > >
> > > Managed-schema:
> > >
> > >> positionIncrementGap="100">
> > > 
> > >   
> > >   
> > >> > words="lang/stopwords_de.txt" ignoreCase="true"/>
> > >   
> > >   
> > > 
> > >   
> > >
> > >
> > > Other additional infos:
> > > https://stackoverflow.com/questions/49276093/solr-
> > highlighting-terms-with-umlaut-not-found-not-highlighted
> > >
> > > Cheers,
> > > Arturas
> >
>


Re: querying vs. highlighting: complete freedom?

2018-03-23 Thread Arturas Mazeika
Hi Erick et al,

>From your answer I understand that this is not a typical case that one
searches for a keyword but highlights something else. Since we have two
parameters (q vs hl.q) I thought they are freely combinable. From your
answer I understand that this is not really the case. My current
understanding came from [1] that says:

hl.q

A query to use for highlighting. This parameter allows you to highlight
different terms than those being used to retrieve documents.
what I hear from you is something different: i.e., that this is not enough
just to combine the q with hl.q, that there are caveats to achieve the task
(multiple fields, FastVectorHighlighter).

Your infos are very helpful.

Cheers,
Arturas

[1]  https://lucene.apache.org/solr/guide/7_2/highlighting.html

On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Basically you need to use a copyField, but in several variants:
>
> If you use the field _exclusively_ for highlighting then store the raw
> content there and have the field use whatever analyzer you want. You
> do _not_ need to have indexed="true" set for the field if you're
> highlighting on the fly. So you're searching against field1 (which has
> indexed="true" stored="false" set) but highlighting against field2
> (which has indexed="false" stored="true" set). Of course any time you
> want to return the contents in a doc your fl needs to specify
> field2...
>
> The above does not bloat your index at all since the cost of
> stored="true" indexed="true" is the same as if you use two fields,
> each with only one option turned on.
>
> The second approach if you want to use FastVectorHighlighter or the
> like is simply to index both fields.
>
> Best,
> Erick
>
> On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <maze...@gmail.com>
> wrote:
> > Hi Solr-Users,
> >
> > I've been playing with a german collection of documents, where I tried to
> > search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
> > this a "legal" use case? My key question is how can I tell solr which
> query
> > analyzer to use for highlighting? Strictly speaking, I should use
> > hl.q=Kündigung to conceptually look for relevant information, but in this
> > case, no highlighting is returned (as all umlauts are left out in the
> > index) .
> >
> > Additional infos:
> >
> > solr version: 7.2
> > urls to query:
> >
> > http://localhost:8983/solr/trans/select?q=trans:Zeit=
> true=trans=Kundigung=3=xml=1
> >
> > http://localhost:8983/solr/trans/select?q=trans:Zeit=
> true=trans=K%C3%BCndigung=3=xml=1
> > <http://localhost:8983/solr/trans/select?q=trans:Zeit=
> true=trans=Kundigung=3=xml=1>
> >
> > Managed-schema:
> >
> >positionIncrementGap="100">
> > 
> >   
> >   
> >> words="lang/stopwords_de.txt" ignoreCase="true"/>
> >   
> >   
> > 
> >   
> >
> >
> > Other additional infos:
> > https://stackoverflow.com/questions/49276093/solr-
> highlighting-terms-with-umlaut-not-found-not-highlighted
> >
> > Cheers,
> > Arturas
>


Re: querying vs. highlighting: complete freedom?

2018-03-22 Thread Erick Erickson
Basically you need to use a copyField, but in several variants:

If you use the field _exclusively_ for highlighting then store the raw
content there and have the field use whatever analyzer you want. You
do _not_ need to have indexed="true" set for the field if you're
highlighting on the fly. So you're searching against field1 (which has
indexed="true" stored="false" set) but highlighting against field2
(which has indexed="false" stored="true" set). Of course any time you
want to return the contents in a doc your fl needs to specify
field2...

The above does not bloat your index at all since the cost of
stored="true" indexed="true" is the same as if you use two fields,
each with only one option turned on.

The second approach if you want to use FastVectorHighlighter or the
like is simply to index both fields.

Best,
Erick

On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <maze...@gmail.com> wrote:
> Hi Solr-Users,
>
> I've been playing with a german collection of documents, where I tried to
> search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
> this a "legal" use case? My key question is how can I tell solr which query
> analyzer to use for highlighting? Strictly speaking, I should use
> hl.q=Kündigung to conceptually look for relevant information, but in this
> case, no highlighting is returned (as all umlauts are left out in the
> index) .
>
> Additional infos:
>
> solr version: 7.2
> urls to query:
>
> http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=Kundigung=3=xml=1
>
> http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=K%C3%BCndigung=3=xml=1
> <http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=Kundigung=3=xml=1>
>
> Managed-schema:
>
>   
> 
>   
>   
>words="lang/stopwords_de.txt" ignoreCase="true"/>
>   
>   
> 
>   
>
>
> Other additional infos:
> https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted
>
> Cheers,
> Arturas


querying vs. highlighting: complete freedom?

2018-03-22 Thread Arturas Mazeika
Hi Solr-Users,

I've been playing with a german collection of documents, where I tried to
search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
this a "legal" use case? My key question is how can I tell solr which query
analyzer to use for highlighting? Strictly speaking, I should use
hl.q=Kündigung to conceptually look for relevant information, but in this
case, no highlighting is returned (as all umlauts are left out in the
index) .

Additional infos:

solr version: 7.2
urls to query:

http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=Kundigung=3=xml=1

http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=K%C3%BCndigung=3=xml=1
<http://localhost:8983/solr/trans/select?q=trans:Zeit=true=trans=Kundigung=3=xml=1>

Managed-schema:

  

  
  
  
  
  

  


Other additional infos:
https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted

Cheers,
Arturas


Highlighting over date fields

2018-02-07 Thread LOPEZ-CORTES Mariano-ext
It's possible to use highlighting over date fields ?

We've tried but we've got no highlighting response for the field.



Highlighting keywords which are not in close proximity with in a field

2018-01-22 Thread sasarun
Hi All, 

Currently when I search for a phrase "Artificial Intelligence in space".
keyword Artificial Intelligence is getting highlighted as number of
occurrence of that word is more in the document. Most of its occurrence is
mostly at the start of document. Whereas word Space is available in the
document at the bottom. Due to which it is not shown in highlighting blob.
Is there a way to highlight the keywords which are not in close proximity 

Thanks
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Need help with solr highlighting feature

2018-01-18 Thread Steve Rowe
Hi Aashish,

Thanks for letting us know.

--
Steve
www.lucidworks.com

> On Jan 17, 2018, at 1:41 PM, Aashish Agarwal <aaashi...@gmail.com> wrote:
> 
> Hello Steve,
> 
> Sorry to disturb, the issue was due to custom tokenizer that I used. Since
> that was not storing offset so term vector was not working.
> Its resolved now.
> 
> On Jan 17, 2018 11:06 PM, "Steve Rowe" <sar...@gmail.com> wrote:
> 
>> Hi Aashish,
>> 
>> Which version of Solr are you using?
>> 
>> Please share your configuration: highlighter and schema.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jan 16, 2018, at 12:20 PM, Aashish Agarwal <aaashi...@gmail.com>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> I am using solr highlighting feature on multivalued field containing
>> korean
>>> words.The feature is not working as expected. Search is working fine but
>> in
>>> case of highlighting it gives response as .
>>> 
>>> I am storing term vector for the field and it is also stored=true.
>>> 
>>> Please reply soon. Need this feature working urgently.
>>> 
>>> Thanks,
>>> Aashish
>> 
>> 



Re: Need help with solr highlighting feature

2018-01-17 Thread Aashish Agarwal
Hello Steve,

Sorry to disturb, the issue was due to custom tokenizer that I used. Since
that was not storing offset so term vector was not working.
Its resolved now.

On Jan 17, 2018 11:06 PM, "Steve Rowe" <sar...@gmail.com> wrote:

> Hi Aashish,
>
> Which version of Solr are you using?
>
> Please share your configuration: highlighter and schema.
>
> --
> Steve
> www.lucidworks.com
>
> > On Jan 16, 2018, at 12:20 PM, Aashish Agarwal <aaashi...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > I am using solr highlighting feature on multivalued field containing
> korean
> > words.The feature is not working as expected. Search is working fine but
> in
> > case of highlighting it gives response as .
> >
> > I am storing term vector for the field and it is also stored=true.
> >
> > Please reply soon. Need this feature working urgently.
> >
> > Thanks,
> > Aashish
>
>


Re: Need help with solr highlighting feature

2018-01-17 Thread Steve Rowe
Hi Aashish,

Which version of Solr are you using?

Please share your configuration: highlighter and schema.

--
Steve
www.lucidworks.com

> On Jan 16, 2018, at 12:20 PM, Aashish Agarwal <aaashi...@gmail.com> wrote:
> 
> Hello,
> 
> I am using solr highlighting feature on multivalued field containing korean
> words.The feature is not working as expected. Search is working fine but in
> case of highlighting it gives response as .
> 
> I am storing term vector for the field and it is also stored=true.
> 
> Please reply soon. Need this feature working urgently.
> 
> Thanks,
> Aashish



Need help with solr highlighting feature

2018-01-16 Thread Aashish Agarwal
Hello,

I am using solr highlighting feature on multivalued field containing korean
words.The feature is not working as expected. Search is working fine but in
case of highlighting it gives response as .

I am storing term vector for the field and it is also stored=true.

Please reply soon. Need this feature working urgently.

Thanks,
Aashish


  1   2   3   4   5   6   7   8   9   10   >