Re: Highlighting tag problem

2015-12-07 Thread Erick Erickson
bq: So the fields in the fl will affect the fields that will be highlighted?

No. The pedantry was that one of the replies could be read as
the fl specification affected what fields were _searched_.

On Mon, Dec 7, 2015 at 2:43 PM, Zheng Lin Edwin Yeo
<edwinye...@gmail.com> wrote:
> So the fields in the fl will affect the fields that will be highlighted?
>
> Isn't only those fields that are specified in hl.fl be highlighted? As I
> found some fields that are not specified in hl.fl also got highlighted, but
> since it is not specified in hl.fl, that field is not shown in the result
> set, and the result set will show a record which doesn't have any highlight
> in it.
>
> Regards,
> Edwin
>
>
> On 8 Dec 2015 2:30 am, "Erick Erickson" <erickerick...@gmail.com> wrote:
>>
>> Pedantry here:
>>
>> bq: Unless you override fl or hl.fl in url parameters you can get a hit in
>> content_type, last_modified, url, or score and those fields will not get
>> highlighted.
>>
>> In the main correct, but the phrasing makes it seem like the fl parameter
>> has something to do with the fields _searched_, when it just
>> specifies the fields _returned_. Perhaps you're thinking of qf in
>> edismax? Or df?...
>>
>> It's spot on that the hl.fl fields are all that's highlighted and this is
>> probably the issue the OP had.
>>
>> Best,
>> Erick
>>
>> On Mon, Dec 7, 2015 at 9:22 AM, Scott Stults
>> <sstu...@opensourceconnections.com> wrote:
>> > I see. There appears to be a gap in what you can match on and what will
> get
>> > highlighted:
>> >
>> > id, title, content_type, last_modified, url, score 
>> >
>> > id, title, content, author, tag
>> >
>> > Unless you override fl or hl.fl in url parameters you can get a hit in
>> > content_type, last_modified, url, or score and those fields will not get
>> > highlighted. Try adding those fields to hl.fl.
>> >
>> >
>> > k/r,
>> > Scott
>> >
>> > On Fri, Dec 4, 2015 at 12:59 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
>> > wrote:
>> >
>> >> Hi Scott,
>> >>
>> >> No, what's describe in SOLR-8334 is the tag appearing at the result,
> but at
>> >> the wrong position.
>> >>
>> >> For this problem, the situation is that when I do a highlight query,
> some
>> >> of the results in the resultset does not contain the search word in
> title,
>> >> content_type, last_modified and  url, as specified in my solrconfig.xml
>> >> which I'm posted earlier on, and there is no  tag in those
> results. So
>> >> I'm not sure why those results are returned.
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>> >>
>> >> On 4 December 2015 at 01:03, Scott Stults <
>> >> sstu...@opensourceconnections.com
>> >> > wrote:
>> >>
>> >> > Edwin,
>> >> >
>> >> > Is this related to what's described in SOLR-8334?
>> >> >
>> >> >
>> >> > k/r,
>> >> > Scott
>> >> >
>> >> > On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <
>> >> edwinye...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > I'm using Solr 5.3.0.
>> >> > > Would like to find out, during a search, sometimes there is a
> match in
>> >> > > content, but it is not highlighted (the word is not in the stopword
>> >> > list)?
>> >> > > Did I make any mistakes in my configuration?
>> >> > >
>> >> > > This is my highlighting request handler from solrconfig.xml.
>> >> > >
>> >> > > 
>> >> > > 
>> >> > > explicit
>> >> > > 10
>> >> > > json
>> >> > > true
>> >> > > text
>> >> > > id, title, content_type, last_modified, url, score
>> >> 
>> >> > >
>> >> > > on
>> >> > > id, title, content, author, tag
>> >> > >true
>> >> > > true
>> >> > > html
>> >> > > 200
>> >> > >
>> >> > > true
>> >> > > signature
>> >

Re: Highlighting tag problem

2015-12-07 Thread Erick Erickson
Pedantry here:

bq: Unless you override fl or hl.fl in url parameters you can get a hit in
content_type, last_modified, url, or score and those fields will not get
highlighted.

In the main correct, but the phrasing makes it seem like the fl parameter
has something to do with the fields _searched_, when it just
specifies the fields _returned_. Perhaps you're thinking of qf in
edismax? Or df?...

It's spot on that the hl.fl fields are all that's highlighted and this is
probably the issue the OP had.

Best,
Erick

On Mon, Dec 7, 2015 at 9:22 AM, Scott Stults
<sstu...@opensourceconnections.com> wrote:
> I see. There appears to be a gap in what you can match on and what will get
> highlighted:
>
> id, title, content_type, last_modified, url, score 
>
> id, title, content, author, tag
>
> Unless you override fl or hl.fl in url parameters you can get a hit in
> content_type, last_modified, url, or score and those fields will not get
> highlighted. Try adding those fields to hl.fl.
>
>
> k/r,
> Scott
>
> On Fri, Dec 4, 2015 at 12:59 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
>
>> Hi Scott,
>>
>> No, what's describe in SOLR-8334 is the tag appearing at the result, but at
>> the wrong position.
>>
>> For this problem, the situation is that when I do a highlight query, some
>> of the results in the resultset does not contain the search word in  title,
>> content_type, last_modified and  url, as specified in my solrconfig.xml
>> which I'm posted earlier on, and there is no  tag in those results. So
>> I'm not sure why those results are returned.
>>
>> Regards,
>> Edwin
>>
>>
>> On 4 December 2015 at 01:03, Scott Stults <
>> sstu...@opensourceconnections.com
>> > wrote:
>>
>> > Edwin,
>> >
>> > Is this related to what's described in SOLR-8334?
>> >
>> >
>> > k/r,
>> > Scott
>> >
>> > On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I'm using Solr 5.3.0.
>> > > Would like to find out, during a search, sometimes there is a match in
>> > > content, but it is not highlighted (the word is not in the stopword
>> > list)?
>> > > Did I make any mistakes in my configuration?
>> > >
>> > > This is my highlighting request handler from solrconfig.xml.
>> > >
>> > > 
>> > > 
>> > > explicit
>> > > 10
>> > > json
>> > > true
>> > > text
>> > > id, title, content_type, last_modified, url, score
>> 
>> > >
>> > > on
>> > > id, title, content, author, tag
>> > >true
>> > > true
>> > > html
>> > > 200
>> > >
>> > > true
>> > > signature
>> > > true
>> > > 100
>> > > 
>> > > 
>> > >
>> > >
>> > > This is my pipeline for the field.
>> > >
>> > >  > > > positionIncrementGap="100">
>> > >
>> > >
>> > >
>> > >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
>> > > segMode="SEARCH"/>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>> > >
>> > >> > > words="stopwords.txt" />
>> > >
>> > >> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> > >
>> > >> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>> > >
>> > >
>> > >
>> > >> > > maxGramSize="15"/>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
>> > > segMode="SEARCH"/>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>> > >
>> > >> > > words="stopwords.txt" />
>> > >
>> > >> > > generateWordParts="0" generateNumberParts="0" catenateWords="0"
>> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>> > >
>> > >> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>> > >
>> > >
>> > >
>> > > 
>> > >
>> > >  
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> >
>> >
>> >
>> > --
>> > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
>> LLC
>> > | 434.409.2780
>> > http://www.opensourceconnections.com
>> >
>>
>
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com


Re: Highlighting large documents

2015-12-04 Thread Zheng Lin Edwin Yeo
Hi Andrea,

I'm using the original highlighter.

Below is my configuration for the highlighter in solrconfig.xml

  
   
   explicit
   10
   json
   true
  text
  id, title, content_type, last_modified, url, score 

  on
   id, title, content, author 
  true
   true
   html
  200
  100

true
signature
true
100
  
  


Have you managed to solve the problem?

Regards,
Edwin


On 4 December 2015 at 23:54, Andrea Gazzarini <a.gazzar...@gmail.com> wrote:

> Hi Zheng,
> just curiousity, because shortly I will have to deal with a similar
> scenario (Solr 5.3.1 + large documents + highlighting).
> Which highlighter are you using?
>
> Andrea
>
> 2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
>
> > Hi,
> >
> > I'm using Solr 5.3.0
> >
> > I found that in large documents, sometimes I face situation that when I
> do
> > a highlight query, the resultset that is returned does not contain the
> > highlighted query. There are actually matches in the documents, but just
> > that they located further back in the documents.
> >
> > I have tried to increase the value of the hl.maxAnalyzedChars, as the
> > default value is 51200, and I have documents that are much larger than
> > 51200 characters. Although this method works, but, when I increase this
> > value, the performance of the search and highlight drops. It can drop
> from
> > less than 0.5 seconds to more than 10 seconds.
> >
> > Would like to check, is this method of increasing the value of the
> > hl.maxAnalyzedChars the best method to use, or is there other ways which
> > can solve the same purpose, but without affecting the performance much?
> >
> > Regards,
> > Edwin
> >
>


Re: Highlighting large documents

2015-12-04 Thread Andrea Gazzarini
Hi Zheng,
just curiousity, because shortly I will have to deal with a similar
scenario (Solr 5.3.1 + large documents + highlighting).
Which highlighter are you using?

Andrea

2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:

> Hi,
>
> I'm using Solr 5.3.0
>
> I found that in large documents, sometimes I face situation that when I do
> a highlight query, the resultset that is returned does not contain the
> highlighted query. There are actually matches in the documents, but just
> that they located further back in the documents.
>
> I have tried to increase the value of the hl.maxAnalyzedChars, as the
> default value is 51200, and I have documents that are much larger than
> 51200 characters. Although this method works, but, when I increase this
> value, the performance of the search and highlight drops. It can drop from
> less than 0.5 seconds to more than 10 seconds.
>
> Would like to check, is this method of increasing the value of the
> hl.maxAnalyzedChars the best method to use, or is there other ways which
> can solve the same purpose, but without affecting the performance much?
>
> Regards,
> Edwin
>


Highlighting large documents

2015-12-04 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.3.0

I found that in large documents, sometimes I face situation that when I do
a highlight query, the resultset that is returned does not contain the
highlighted query. There are actually matches in the documents, but just
that they located further back in the documents.

I have tried to increase the value of the hl.maxAnalyzedChars, as the
default value is 51200, and I have documents that are much larger than
51200 characters. Although this method works, but, when I increase this
value, the performance of the search and highlight drops. It can drop from
less than 0.5 seconds to more than 10 seconds.

Would like to check, is this method of increasing the value of the
hl.maxAnalyzedChars the best method to use, or is there other ways which
can solve the same purpose, but without affecting the performance much?

Regards,
Edwin


Re: Highlighting large documents

2015-12-04 Thread Andrea Gazzarini
No no, sorry, the project is not yet started so I didn't experience your
issue, but I'll be a careful listener of this thread

Best,
Andrea

2015-12-04 17:04 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:

> Hi Andrea,
>
> I'm using the original highlighter.
>
> Below is my configuration for the highlighter in solrconfig.xml
>
>   
>
>explicit
>10
>json
>true
>   text
>   id, title, content_type, last_modified, url, score 
>
>   on
>id, title, content, author 
>   true
>true
>html
>   200
>   100
>
> true
> signature
> true
> 100
>   
>   
>
>
> Have you managed to solve the problem?
>
> Regards,
> Edwin
>
>
> On 4 December 2015 at 23:54, Andrea Gazzarini <a.gazzar...@gmail.com>
> wrote:
>
> > Hi Zheng,
> > just curiousity, because shortly I will have to deal with a similar
> > scenario (Solr 5.3.1 + large documents + highlighting).
> > Which highlighter are you using?
> >
> > Andrea
> >
> > 2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> >
> > > Hi,
> > >
> > > I'm using Solr 5.3.0
> > >
> > > I found that in large documents, sometimes I face situation that when I
> > do
> > > a highlight query, the resultset that is returned does not contain the
> > > highlighted query. There are actually matches in the documents, but
> just
> > > that they located further back in the documents.
> > >
> > > I have tried to increase the value of the hl.maxAnalyzedChars, as the
> > > default value is 51200, and I have documents that are much larger than
> > > 51200 characters. Although this method works, but, when I increase this
> > > value, the performance of the search and highlight drops. It can drop
> > from
> > > less than 0.5 seconds to more than 10 seconds.
> > >
> > > Would like to check, is this method of increasing the value of the
> > > hl.maxAnalyzedChars the best method to use, or is there other ways
> which
> > > can solve the same purpose, but without affecting the performance much?
> > >
> > > Regards,
> > > Edwin
> > >
> >
>


Re: Highlighting tag problem

2015-12-03 Thread Zheng Lin Edwin Yeo
Hi Scott,

No, what's describe in SOLR-8334 is the tag appearing at the result, but at
the wrong position.

For this problem, the situation is that when I do a highlight query, some
of the results in the resultset does not contain the search word in  title,
content_type, last_modified and  url, as specified in my solrconfig.xml
which I'm posted earlier on, and there is no  tag in those results. So
I'm not sure why those results are returned.

Regards,
Edwin


On 4 December 2015 at 01:03, Scott Stults <sstu...@opensourceconnections.com
> wrote:

> Edwin,
>
> Is this related to what's described in SOLR-8334?
>
>
> k/r,
> Scott
>
> On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I'm using Solr 5.3.0.
> > Would like to find out, during a search, sometimes there is a match in
> > content, but it is not highlighted (the word is not in the stopword
> list)?
> > Did I make any mistakes in my configuration?
> >
> > This is my highlighting request handler from solrconfig.xml.
> >
> > 
> > 
> > explicit
> > 10
> > json
> > true
> > text
> > id, title, content_type, last_modified, url, score 
> >
> > on
> > id, title, content, author, tag
> >true
> > true
> > html
> > 200
> >
> > true
> > signature
> > true
> > 100
> > 
> > 
> >
> >
> > This is my pipeline for the field.
> >
> >   > positionIncrementGap="100">
> >
> >
> >
> > > segMode="SEARCH"/>
> >
> >
> >
> >
> >
> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> >
> > > words="stopwords.txt" />
> >
> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >
> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
> >
> >
> >
> > > maxGramSize="15"/>
> >
> >
> >
> >
> >
> > > segMode="SEARCH"/>
> >
> >
> >
> >
> >
> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> >
> > > words="stopwords.txt" />
> >
> > > generateWordParts="0" generateNumberParts="0" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
> >
> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
> >
> >
> >
> > 
> >
> >  
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com
>


Highlighting tag problem

2015-12-03 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.3.0.
Would like to find out, during a search, sometimes there is a match in
content, but it is not highlighted (the word is not in the stopword list)?
Did I make any mistakes in my configuration?

This is my highlighting request handler from solrconfig.xml.



explicit
10
json
true
text
id, title, content_type, last_modified, url, score 

on
id, title, content, author, tag
   true
true
html
200

true
signature
true
100




This is my pipeline for the field.

 

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   



 


Regards,
Edwin


Re: Highlighting tag problem

2015-12-03 Thread Scott Stults
Edwin,

Is this related to what's described in SOLR-8334?


k/r,
Scott

On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi,
>
> I'm using Solr 5.3.0.
> Would like to find out, during a search, sometimes there is a match in
> content, but it is not highlighted (the word is not in the stopword list)?
> Did I make any mistakes in my configuration?
>
> This is my highlighting request handler from solrconfig.xml.
>
> 
> 
> explicit
> 10
> json
> true
> text
> id, title, content_type, last_modified, url, score 
>
> on
> id, title, content, author, tag
>true
> true
> html
> 200
>
> true
> signature
> true
> 100
> 
> 
>
>
> This is my pipeline for the field.
>
>   positionIncrementGap="100">
>
>
>
> segMode="SEARCH"/>
>
>
>
>
>
> words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>
> words="stopwords.txt" />
>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>
>
>
> maxGramSize="15"/>
>
>
>
>
>
> segMode="SEARCH"/>
>
>
>
>
>
> words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>
> words="stopwords.txt" />
>
> generateWordParts="0" generateNumberParts="0" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>
>
>
> 
>
>  
>
>
> Regards,
> Edwin
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


RE: Help With Phrase Highlighting

2015-12-03 Thread Teague James
Thanks everyone who replied! The FastVectorHighlighter did the trick. Here
is how I configured it:

In solrconfig.xml:
In the requestHandler I added:
on
text
true
100

In schema.xml:
I modified the text field:


I restarted Solr, re-indexed the documents and tested. All phrases are
correctly highlighted as phrases! Thanks everyone!

-Teague



Help With Phrase Highlighting

2015-12-01 Thread Teague James
Hello everyone,

I am having difficulty enabling phrase highlighting and am hoping someone
here can offer some help. This is what I have currently:

Solr 4.9
solrconfig.xml (partial snip)


xml
explicit
10
text
on
text
html
100





schema.xml (partial snip)

   

Query (partial snip):
...select?fq=id:43040="my%20search%20phrase"

Response (partial snip):
...

ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta
assentior. (my search


phrase facilitates highlighting). Et option molestiae referrentur
ius. Viris quaeque legimus an pri


The document in which this phrase is found is very long. If I reduce the
document to a single sentence, such as "My search phrase facilitates
highlighting" then the response I get from Solr is:

My search phrase facilitates highlighting


What I am trying to achieve instead, regardless of the document size is:
My search phrase with a single indicator at the beginning
and end rather than three separate words that may get dsitributed between
two different snippets depending on the placement of the snippet in te
larger document.

I tried to follow this guide:
http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-
search-phrase-only/25970452#25970452 but got zero results. I suspect that
this is due to the hl parameters in my solrconfig file, but I cannot find
any specific guidance on the correct parameters should be. I tried
commenting out all of the hl parameters and also got no results.

Can anyone offer any solutions for searching large documents and returning a
single phrase highlight?

-Teague



Re: Help With Phrase Highlighting

2015-12-01 Thread Philippe Soares
Hi,
Did you try hl.mergeContiguous=true ?

On Tue, Dec 1, 2015 at 3:36 PM, Teague James <teag...@insystechinc.com>
wrote:

> Hello everyone,
>
> I am having difficulty enabling phrase highlighting and am hoping someone
> here can offer some help. This is what I have currently:
>
> Solr 4.9
> solrconfig.xml (partial snip)
> 
> 
> xml
> explicit
> 10
> text
> on
> text
> html
> 100
> 
> 
> 
> 
>
> schema.xml (partial snip)
> required="true" multiValued="false" />
>
>
> Query (partial snip):
> ...select?fq=id:43040="my%20search%20phrase"
>
> Response (partial snip):
> ...
> 
> ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta
> assentior. (my search
> 
> 
> phrase facilitates highlighting). Et option molestiae referrentur
> ius. Viris quaeque legimus an pri
> 
>
> The document in which this phrase is found is very long. If I reduce the
> document to a single sentence, such as "My search phrase facilitates
> highlighting" then the response I get from Solr is:
> 
> My search phrase facilitates highlighting
> 
>
> What I am trying to achieve instead, regardless of the document size is:
> My search phrase with a single indicator at the beginning
> and end rather than three separate words that may get dsitributed between
> two different snippets depending on the placement of the snippet in te
> larger document.
>
> I tried to follow this guide:
>
> http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-
> search-phrase-only/25970452#25970452 but got zero results. I suspect that
> this is due to the hl parameters in my solrconfig file, but I cannot find
> any specific guidance on the correct parameters should be. I tried
> commenting out all of the hl parameters and also got no results.
>
> Can anyone offer any solutions for searching large documents and returning
> a
> single phrase highlight?
>
> -Teague
>
>


-- 
[image: GQ Life Sciences, Inc.] <http://www.gqlifesciences.com/>Philippe
Soares Senior Developer   |  [image: ☎] +1 508 599 3963
GQ Life Sciences, Inc. www.gqlifesciences.comThis email message and any
attachments are confidential and may be privileged. If you are not the
intended recipient, please notify GQ Life Sciences immediately by
forwarding this message to le...@gqlifesciences.com and destroy all copies
of this message and any attachments without reading or disclosing their
contents.


Re: Help With Phrase Highlighting

2015-12-01 Thread Teague James
Hello,

Thanks for replying! I tried using it in a query string, but without success. 
Should I add it to my solrconfig? If so, are there any other hl parameters that 
are necessary? 

-Teague

> On Dec 1, 2015, at 9:01 PM, Philippe Soares <soa...@gqlifesciences.com> wrote:
> 
> Hi,
> Did you try hl.mergeContiguous=true ?
> 
> On Tue, Dec 1, 2015 at 3:36 PM, Teague James <teag...@insystechinc.com>
> wrote:
> 
>> Hello everyone,
>> 
>> I am having difficulty enabling phrase highlighting and am hoping someone
>> here can offer some help. This is what I have currently:
>> 
>> Solr 4.9
>> solrconfig.xml (partial snip)
>> 
>>
>>xml
>>explicit
>>10
>>text
>>on
>>text
>>html
>>100
>>
>>
>>
>> 
>> 
>> schema.xml (partial snip)
>>   > required="true" multiValued="false" />
>>   
>> 
>> Query (partial snip):
>> ...select?fq=id:43040="my%20search%20phrase"
>> 
>> Response (partial snip):
>> ...
>> 
>> ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta
>> assentior. (my search
>> 
>> 
>> phrase facilitates highlighting). Et option molestiae referrentur
>> ius. Viris quaeque legimus an pri
>> 
>> 
>> The document in which this phrase is found is very long. If I reduce the
>> document to a single sentence, such as "My search phrase facilitates
>> highlighting" then the response I get from Solr is:
>> 
>> My search phrase facilitates highlighting
>> 
>> 
>> What I am trying to achieve instead, regardless of the document size is:
>> My search phrase with a single indicator at the beginning
>> and end rather than three separate words that may get dsitributed between
>> two different snippets depending on the placement of the snippet in te
>> larger document.
>> 
>> I tried to follow this guide:
>> 
>> http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-
>> search-phrase-only/25970452#25970452 but got zero results. I suspect that
>> this is due to the hl parameters in my solrconfig file, but I cannot find
>> any specific guidance on the correct parameters should be. I tried
>> commenting out all of the hl parameters and also got no results.
>> 
>> Can anyone offer any solutions for searching large documents and returning
>> a
>> single phrase highlight?
>> 
>> -Teague
> 
> 
> -- 
> [image: GQ Life Sciences, Inc.] <http://www.gqlifesciences.com/>Philippe
> Soares Senior Developer   |  [image: ☎] +1 508 599 3963
> GQ Life Sciences, Inc. www.gqlifesciences.comThis email message and any
> attachments are confidential and may be privileged. If you are not the
> intended recipient, please notify GQ Life Sciences immediately by
> forwarding this message to le...@gqlifesciences.com and destroy all copies
> of this message and any attachments without reading or disclosing their
> contents.


Re: Help With Phrase Highlighting

2015-12-01 Thread Koji Sekiguchi

Hi Teague,

I couldn't understand the part of "document size" in your question, but if 
you'd like
Solr to return snippet

My search phrase

instead of

My search phrase

you should use FastVectorHighlighter. In case use of FVH, your highlight field 
(hl.fl=text)
need to be indexed with options termVectors=true, termPositions=true and 
termPositions=true.

Good luck!

Koji


On 2015/12/02 5:36, Teague James wrote:

Hello everyone,

I am having difficulty enabling phrase highlighting and am hoping someone
here can offer some help. This is what I have currently:

Solr 4.9
solrconfig.xml (partial snip)


xml
explicit
10
text
on
text
html
100





schema.xml (partial snip)



Query (partial snip):
...select?fq=id:43040="my%20search%20phrase"

Response (partial snip):
...

ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta
assentior. (my search


phrase facilitates highlighting). Et option molestiae referrentur
ius. Viris quaeque legimus an pri


The document in which this phrase is found is very long. If I reduce the
document to a single sentence, such as "My search phrase facilitates
highlighting" then the response I get from Solr is:

My search phrase facilitates highlighting


What I am trying to achieve instead, regardless of the document size is:
My search phrase with a single indicator at the beginning
and end rather than three separate words that may get dsitributed between
two different snippets depending on the placement of the snippet in te
larger document.

I tried to follow this guide:
http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-
search-phrase-only/25970452#25970452 but got zero results. I suspect that
this is due to the hl parameters in my solrconfig file, but I cannot find
any specific guidance on the correct parameters should be. I tried
commenting out all of the hl parameters and also got no results.

Can anyone offer any solutions for searching large documents and returning a
single phrase highlight?

-Teague






Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-11-23 Thread Zheng Lin Edwin Yeo
Hi Scott,

I've created a Jira issue for this, the code is SOLR-8334.

Regards,
Edwin


On 24 November 2015 at 00:36, Scott Stults <
sstu...@opensourceconnections.com> wrote:

> Edwin,
>
> Congrats on getting it to work! Would you please create a Jira issue for
> this and add the patch? You won't need the inline change comments -- a good
> description in the ticket itself will work best.
>
> k/r,
> Scott
>
> On Sun, Nov 22, 2015 at 10:13 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> wrote:
>
> > I've tried to do some minor modification in the code under
> > JiebaSegmenter.java, and the highlighting seems to be fine now.
> >
> > Basically, I created another int called offset2 under process() method.
> > int offset2 = 0;
> >
> > Then I modified the offset to offset2 for this part of the code under
> > process() method.
> >
> > if (sb.length() > 0)
> > if (mode == SegMode.SEARCH) {
> > for (Word token : sentenceProcess(sb.toString())) {
> > // tokens.add(new SegToken(token, offset, offset +=
> > token.length()));
> > tokens.add(new SegToken(token, offset2, offset2 +=
> > token.length())); // Change to offset2 by Edwin
> > }
> > } else {
> > for (Word token : sentenceProcess(sb.toString())) {
> > if (token.length() > 2) {
> > Word gram2;
> > int j = 0;
> > for (; j < token.length() - 1; ++j) {
> > gram2 = token.subSequence(j, j + 2);
> > if (wordDict.containsWord(gram2.getToken()))
> > // tokens.add(new SegToken(gram2, offset
> +
> > j, offset + j + 2));
> > tokens.add(new SegToken(gram2, offset2 +
> j,
> > offset2 + j + 2));  // Change to offset2 by Edwin
> > }
> > }
> > if (token.length() > 3) {
> > Word gram3;
> > int j = 0;
> > for (; j < token.length() - 2; ++j) {
> > gram3 = token.subSequence(j, j + 3);
> > if (wordDict.containsWord(gram3.getToken()))
> > // tokens.add(new SegToken(gram3, offset
> +
> > j, offset + j + 3));
> > tokens.add(new SegToken(gram3, offset2 +
> j,
> > offset2 + j + 3));  // Change to offset2 by Edwin
> > }
> > }
> > // tokens.add(new SegToken(token, offset, offset +=
> > token.length()));
> > tokens.add(new SegToken(token, offset2, offset2 +=
> > token.length()));// Change to offset2 by Edwin
> > }
> > }
> >
> >
> > Not sure if this is just a workaround, or can be used as a permanent
> > solution
> >
> > Regards,
> > Edwin
> >
> >
> > On 28 October 2015 at 15:29, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> > wrote:
> >
> > > Hi Scott,
> > >
> > > I have tried to edit the SegToken.java file in the jieba-analysis-1.0.0
> > > package with a +1 at both the startOffset and endOffset value (see code
> > > below), and now the  tag of the content is shifted to the correct
> > place
> > > at the content. However, this means that in the title and other fields
> > > where the  tag is orignally at the correct place, they will get the
> > "org.apache.lucene.search.highlight.InvalidTokenOffsetsException"
> > > exception. I have temporary use another tokenizer for the other fields
> > > first.
> > >
> > > public SegToken(Word word, int startOffset, int endOffset) {
> > > this.word = word;
> > > this.startOffset = startOffset+1;
> > > this.endOffset = endOffset+1;
> > > }
> > >
> > > However, I don't think this can be a permanent solution, so I'm trying
> to
> > > zoom in further to the code, to see what's the difference with the
> > content
> > > and other fields.
> > >
> > > I have also find that althought JiebaTokenizer works better for Chinese
> > > characters, it doesn't work well for English characters. For example,
> if
> > I
> > > search for &quo

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-11-23 Thread Scott Stults
Edwin,

Congrats on getting it to work! Would you please create a Jira issue for
this and add the patch? You won't need the inline change comments -- a good
description in the ticket itself will work best.

k/r,
Scott

On Sun, Nov 22, 2015 at 10:13 PM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> I've tried to do some minor modification in the code under
> JiebaSegmenter.java, and the highlighting seems to be fine now.
>
> Basically, I created another int called offset2 under process() method.
> int offset2 = 0;
>
> Then I modified the offset to offset2 for this part of the code under
> process() method.
>
> if (sb.length() > 0)
> if (mode == SegMode.SEARCH) {
> for (Word token : sentenceProcess(sb.toString())) {
> // tokens.add(new SegToken(token, offset, offset +=
> token.length()));
> tokens.add(new SegToken(token, offset2, offset2 +=
> token.length())); // Change to offset2 by Edwin
> }
> } else {
> for (Word token : sentenceProcess(sb.toString())) {
> if (token.length() > 2) {
> Word gram2;
> int j = 0;
> for (; j < token.length() - 1; ++j) {
> gram2 = token.subSequence(j, j + 2);
> if (wordDict.containsWord(gram2.getToken()))
> // tokens.add(new SegToken(gram2, offset +
> j, offset + j + 2));
> tokens.add(new SegToken(gram2, offset2 + j,
> offset2 + j + 2));  // Change to offset2 by Edwin
> }
> }
> if (token.length() > 3) {
> Word gram3;
> int j = 0;
> for (; j < token.length() - 2; ++j) {
> gram3 = token.subSequence(j, j + 3);
> if (wordDict.containsWord(gram3.getToken()))
> // tokens.add(new SegToken(gram3, offset +
> j, offset + j + 3));
> tokens.add(new SegToken(gram3, offset2 + j,
> offset2 + j + 3));  // Change to offset2 by Edwin
> }
> }
> // tokens.add(new SegToken(token, offset, offset +=
> token.length()));
> tokens.add(new SegToken(token, offset2, offset2 +=
> token.length()));// Change to offset2 by Edwin
> }
> }
>
>
> Not sure if this is just a workaround, or can be used as a permanent
> solution
>
> Regards,
> Edwin
>
>
> On 28 October 2015 at 15:29, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
>
> > Hi Scott,
> >
> > I have tried to edit the SegToken.java file in the jieba-analysis-1.0.0
> > package with a +1 at both the startOffset and endOffset value (see code
> > below), and now the  tag of the content is shifted to the correct
> place
> > at the content. However, this means that in the title and other fields
> > where the  tag is orignally at the correct place, they will get the
> "org.apache.lucene.search.highlight.InvalidTokenOffsetsException"
> > exception. I have temporary use another tokenizer for the other fields
> > first.
> >
> > public SegToken(Word word, int startOffset, int endOffset) {
> > this.word = word;
> > this.startOffset = startOffset+1;
> > this.endOffset = endOffset+1;
> > }
> >
> > However, I don't think this can be a permanent solution, so I'm trying to
> > zoom in further to the code, to see what's the difference with the
> content
> > and other fields.
> >
> > I have also find that althought JiebaTokenizer works better for Chinese
> > characters, it doesn't work well for English characters. For example, if
> I
> > search for "water", the JiebaTokenizer will cut it as follow:
> > w|at|er
> > It can't cut it as a full word, which HMMChineseTokenizer is able to.
> >
> > Here's my configuration in schema.xml:
> >
> >  > positionIncrementGap="100">
> >  
> >  >  segMode="SEARCH"/>
> > 
> > 
> >  > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> > 
> >  > maxGramSize="15"/>
> >  
> >  
> >  >  segMode="SEARCH"/>
> > 
> > 
> >  > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> > 
> >  

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-11-22 Thread Zheng Lin Edwin Yeo
I've tried to do some minor modification in the code under
JiebaSegmenter.java, and the highlighting seems to be fine now.

Basically, I created another int called offset2 under process() method.
int offset2 = 0;

Then I modified the offset to offset2 for this part of the code under
process() method.

if (sb.length() > 0)
if (mode == SegMode.SEARCH) {
for (Word token : sentenceProcess(sb.toString())) {
// tokens.add(new SegToken(token, offset, offset +=
token.length()));
tokens.add(new SegToken(token, offset2, offset2 +=
token.length())); // Change to offset2 by Edwin
}
} else {
for (Word token : sentenceProcess(sb.toString())) {
if (token.length() > 2) {
Word gram2;
int j = 0;
for (; j < token.length() - 1; ++j) {
gram2 = token.subSequence(j, j + 2);
if (wordDict.containsWord(gram2.getToken()))
// tokens.add(new SegToken(gram2, offset +
j, offset + j + 2));
tokens.add(new SegToken(gram2, offset2 + j,
offset2 + j + 2));  // Change to offset2 by Edwin
}
}
if (token.length() > 3) {
Word gram3;
int j = 0;
for (; j < token.length() - 2; ++j) {
gram3 = token.subSequence(j, j + 3);
if (wordDict.containsWord(gram3.getToken()))
// tokens.add(new SegToken(gram3, offset +
j, offset + j + 3));
tokens.add(new SegToken(gram3, offset2 + j,
offset2 + j + 3));  // Change to offset2 by Edwin
}
}
// tokens.add(new SegToken(token, offset, offset +=
token.length()));
tokens.add(new SegToken(token, offset2, offset2 +=
token.length()));// Change to offset2 by Edwin
}
}


Not sure if this is just a workaround, or can be used as a permanent
solution

Regards,
Edwin


On 28 October 2015 at 15:29, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi Scott,
>
> I have tried to edit the SegToken.java file in the jieba-analysis-1.0.0
> package with a +1 at both the startOffset and endOffset value (see code
> below), and now the  tag of the content is shifted to the correct place
> at the content. However, this means that in the title and other fields
> where the  tag is orignally at the correct place, they will get the 
> "org.apache.lucene.search.highlight.InvalidTokenOffsetsException"
> exception. I have temporary use another tokenizer for the other fields
> first.
>
> public SegToken(Word word, int startOffset, int endOffset) {
> this.word = word;
> this.startOffset = startOffset+1;
> this.endOffset = endOffset+1;
> }
>
> However, I don't think this can be a permanent solution, so I'm trying to
> zoom in further to the code, to see what's the difference with the content
> and other fields.
>
> I have also find that althought JiebaTokenizer works better for Chinese
> characters, it doesn't work well for English characters. For example, if I
> search for "water", the JiebaTokenizer will cut it as follow:
> w|at|er
> It can't cut it as a full word, which HMMChineseTokenizer is able to.
>
> Here's my configuration in schema.xml:
>
>  positionIncrementGap="100">
>  
>   segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> 
>  maxGramSize="15"/>
>  
>  
>   segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> 
>   
>   
>
> Does anyone knows if JiebaTokenizer is optimised to take in English
> characters as well?
>
> Regards,
> Edwin
>
>
> On 27 October 2015 at 15:57, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
>
>> Hi Scott,
>>
>> Thank you for providing the links and references. Will look through them,
>> and let you know if I find any solutions or workaround.
>>
>> Regards,
>> Edwin
>>
>>
>> On 27 October 2015 at 11:13, Scott Chu <scott@udngroup.com> wrote:
>>
>>>
>>> Take a look at Michael's 2 articles, they might help you calrify the
>>> idea of highlighting in Solr:
>>>
>>> Changing Bits: Lucene's TokenStreams are actually graphs!
>>>
>>> http://blog.mikemccandless.com/20

Re: highlighting on child document

2015-11-08 Thread Mikhail Khludnev
On Thu, Nov 5, 2015 at 12:12 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

>
> Highlighter for block join hasn't been implemented.


Here I'm wrong:
 https://issues.apache.org/jira/browse/LUCENE-5929



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: highlighting on child document

2015-11-08 Thread Yangrui Guo
But how does highlighting work with block join query? Do I need to supply
additional parameter?

Yangrui

On Sun, Nov 8, 2015 at 12:45 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> On Thu, Nov 5, 2015 at 12:12 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> >
> > Highlighter for block join hasn't been implemented.
>
>
> Here I'm wrong:
>  https://issues.apache.org/jira/browse/LUCENE-5929
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>
>


Re: highlighting on child document

2015-11-05 Thread Yangrui Guo
So if child document highlighting doesn't work how can I let solr tell
which child document and its field matched?

On Wednesday, November 4, 2015, Mikhail Khludnev <mkhlud...@griddynamics.com>
wrote:

> Hello,
>
> Highlighter for block join hasn't been implemented. So, far you can call
> highlighter with children query also passing fq={!child
> ..}parent-id:.
>
> On Wed, Nov 4, 2015 at 7:57 PM, Yangrui Guo <guoyang...@gmail.com
> <javascript:;>> wrote:
>
> > Hi
> >
> > I want to highlight matched terms on child documents because I need to
> > determine which field matched the search terms. However when I use block
> > join solr returned empty highlight fields. How can I use highlight with
> > nested document? Or is there anyway to tell which field matched the query
> > terms?
> >
> > Yangrui
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com <javascript:;>>
>


Re: highlighting on child document

2015-11-04 Thread Mikhail Khludnev
Hello,

Highlighter for block join hasn't been implemented. So, far you can call
highlighter with children query also passing fq={!child
..}parent-id:.

On Wed, Nov 4, 2015 at 7:57 PM, Yangrui Guo  wrote:

> Hi
>
> I want to highlight matched terms on child documents because I need to
> determine which field matched the search terms. However when I use block
> join solr returned empty highlight fields. How can I use highlight with
> nested document? Or is there anyway to tell which field matched the query
> terms?
>
> Yangrui
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





highlighting on child document

2015-11-04 Thread Yangrui Guo
Hi

I want to highlight matched terms on child documents because I need to
determine which field matched the search terms. However when I use block
join solr returned empty highlight fields. How can I use highlight with
nested document? Or is there anyway to tell which field matched the query
terms?

Yangrui


Re: highlighting on child document

2015-11-04 Thread Alessandro Benedetti
My colleagues will correct me if i am wrong.
Solr Join is actually not the same as Relational Join.
This means that you can return in the result only one layer of entities (
the parent layer or the child layer ) even if your original search was on a
different layer.
You can search on children and return all the parents ( and vice versa) but
not returning both children and parents together.
Because of this I believe could be a problem to highlight the content that
was in the children.

As nested faceting is becoming more complete I believe soon , also nested
Highlighting could become a priority.
I encourage anyone that has a more updated view on this to correct me.

Cheers

On 4 November 2015 at 16:57, Yangrui Guo <guoyang...@gmail.com> wrote:

> Hi
>
> I want to highlight matched terms on child documents because I need to
> determine which field matched the search terms. However when I use block
> join solr returned empty highlight fields. How can I use highlight with
> nested document? Or is there anyway to tell which field matched the query
> terms?
>
> Yangrui
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-28 Thread Zheng Lin Edwin Yeo
Hi Scott,

I have tried to edit the SegToken.java file in the jieba-analysis-1.0.0
package with a +1 at both the startOffset and endOffset value (see code
below), and now the  tag of the content is shifted to the correct place
at the content. However, this means that in the title and other fields
where the  tag is orignally at the correct place, they will get
the "org.apache.lucene.search.highlight.InvalidTokenOffsetsException"
exception. I have temporary use another tokenizer for the other fields
first.

public SegToken(Word word, int startOffset, int endOffset) {
this.word = word;
this.startOffset = startOffset+1;
this.endOffset = endOffset+1;
}

However, I don't think this can be a permanent solution, so I'm trying to
zoom in further to the code, to see what's the difference with the content
and other fields.

I have also find that althought JiebaTokenizer works better for Chinese
characters, it doesn't work well for English characters. For example, if I
search for "water", the JiebaTokenizer will cut it as follow:
w|at|er
It can't cut it as a full word, which HMMChineseTokenizer is able to.

Here's my configuration in schema.xml:


 






 
 





  
  

Does anyone knows if JiebaTokenizer is optimised to take in English
characters as well?

Regards,
Edwin


On 27 October 2015 at 15:57, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi Scott,
>
> Thank you for providing the links and references. Will look through them,
> and let you know if I find any solutions or workaround.
>
> Regards,
> Edwin
>
>
> On 27 October 2015 at 11:13, Scott Chu <scott@udngroup.com> wrote:
>
>>
>> Take a look at Michael's 2 articles, they might help you calrify the idea
>> of highlighting in Solr:
>>
>> Changing Bits: Lucene's TokenStreams are actually graphs!
>>
>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>>
>> Also take a look at 4th paragraph In his another article:
>>
>> Changing Bits: A new Lucene highlighter is born
>>
>> http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html
>>
>> Currently, I can't figure out the possible cause of your problem unless I
>> got spare time to test it on my own, which is not available these days (Got
>> some projects to close)!
>>
>> If you find the solution or workaround, pls. let us know. Good luck again!
>>
>> Scott Chu,scott@udngroup.com
>> 2015/10/27
>>
>> - Original Message -
>> *From: *Scott Chu <scott@udngroup.com>
>> *To: *solr-user <solr-user@lucene.apache.org>
>> *Date: *2015-10-27, 10:27:45
>> *Subject: *Re: Highlighting content field problem when using
>> JiebaTokenizerFactory
>>
>> Hi Edward,
>>
>> Took a lot of time to see if there's anything can help you to define
>> the cause of your problem. Maybe this might help you a bit:
>>
>> [SOLR-4722] Highlighter which generates a list of query term position(s)
>> for each item in a list of documents, or returns null if highlighting is
>> disabled. - AS...
>> https://issues.apache.org/jira/browse/SOLR-4722
>>
>> This one is modified from FastVectorHighLighter, so ensure those 3 term*
>> attributes are on.
>>
>> Scott Chu,scott@udngroup.com
>> 2015/10/27
>>
>> - Original Message -
>> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>> *To: *solr-user <solr-user@lucene.apache.org>
>> *Date: *2015-10-23, 10:42:32
>> *Subject: *Re: Highlighting content field problem when using
>> JiebaTokenizerFactory
>>
>> Hi Scott,
>>
>> Thank you for your respond.
>>
>> 1. You said the problem only happens on "contents" field, so maybe
>> there're
>> something wrong with the contents of that field. Doe it contain any
>> special
>> thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions
>> something about HTML stripping will cause highlight problem. Maybe you can
>>
>> try purify that fields to be closed to pure text and see if highlight
>> comes
>> ok.
>> *A) I check that the SOLR-42 is mentioning about the
>> HTMLStripWhiteSpaceTokenizerFactory, which I'm not using. I believe that
>> tokenizer is already deprecated too. I've tried with all kinds of content
>> for rich-text documents, and all of them have the same problem.*
>>
>> 2. Maybe something imcompatible between JiebaTokenizer and Solr
>> highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
>> SmartChinese (I don't use this since I am dealing with Traditional Chinese
>>

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-27 Thread Zheng Lin Edwin Yeo
Hi Scott,

Thank you for providing the links and references. Will look through them,
and let you know if I find any solutions or workaround.

Regards,
Edwin


On 27 October 2015 at 11:13, Scott Chu <scott@udngroup.com> wrote:

>
> Take a look at Michael's 2 articles, they might help you calrify the idea
> of highlighting in Solr:
>
> Changing Bits: Lucene's TokenStreams are actually graphs!
>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>
> Also take a look at 4th paragraph In his another article:
>
> Changing Bits: A new Lucene highlighter is born
>
> http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html
>
> Currently, I can't figure out the possible cause of your problem unless I
> got spare time to test it on my own, which is not available these days (Got
> some projects to close)!
>
> If you find the solution or workaround, pls. let us know. Good luck again!
>
> Scott Chu,scott@udngroup.com
> 2015/10/27
>
> - Original Message -
> *From: *Scott Chu <scott@udngroup.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-27, 10:27:45
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Edward,
>
> Took a lot of time to see if there's anything can help you to define
> the cause of your problem. Maybe this might help you a bit:
>
> [SOLR-4722] Highlighter which generates a list of query term position(s)
> for each item in a list of documents, or returns null if highlighting is
> disabled. - AS...
> https://issues.apache.org/jira/browse/SOLR-4722
>
> This one is modified from FastVectorHighLighter, so ensure those 3 term*
> attributes are on.
>
> Scott Chu,scott@udngroup.com
> 2015/10/27
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-23, 10:42:32
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Scott,
>
> Thank you for your respond.
>
> 1. You said the problem only happens on "contents" field, so maybe there're
> something wrong with the contents of that field. Doe it contain any special
> thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions
> something about HTML stripping will cause highlight problem. Maybe you can
>
> try purify that fields to be closed to pure text and see if highlight comes
> ok.
> *A) I check that the SOLR-42 is mentioning about the
> HTMLStripWhiteSpaceTokenizerFactory, which I'm not using. I believe that
> tokenizer is already deprecated too. I've tried with all kinds of content
> for rich-text documents, and all of them have the same problem.*
>
> 2. Maybe something imcompatible between JiebaTokenizer and Solr
> highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
> SmartChinese (I don't use this since I am dealing with Traditional Chinese
>
> but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and
>
> see if the problem goes away. However when I'm googling similar problem, I
>
> saw you asked same question on August at Huaban/Jieba-analysis and somebody
> said he also uses JiebaTokenizer but he doesn't have your problem. So I see
> this could be less suspect.
> *A) I was thinking about the incompatible issue too, as I previously
> thought that JiebaTokenizer is optimised for Solr 4.x, so it may have issue
> in 5.x. But the person from Hunban/Jieba-analysis said that he doesn't have
> this problem in Solr 5.1. I also face the same problem in Solr 5.1, and
> although I'm using Solr 5.3.0 now, the same problem persist. *
>
> I'm looking at the indexing process too, to see if there's any problem
> there. But just can't figure out why it only happen to JiebaTokenizer, and
>
> it only happen for content field.
>
>
> Regards,
> Edwin
>
>
> On 23 October 2015 at 09:41, Scott Chu <scott@udngroup.com
> <+scott@udngroup.com>> wrote:
>
> > Hi Edwin,
> >
> > Since you've tested all my suggestions and the problem is still there, I
>
> > can't think of anything wrong with your configuration. Now I can only
> > suspect two things:
> >
> > 1. You said the problem only happens on "contents" field, so maybe
> > there're something wrong with the contents of that field. Doe it contain
>
> > any special thing in them, e.g. HTML tags or symbols. I recall SOLR-42
> > mentions something about HTML stripping will cause highlight problem.
> Maybe
> > you can try purify that fields to be closed to pure text and see if
> > highligh

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-26 Thread Scott Chu

Take a look at Michael's 2 articles, they might help you calrify the idea of 
highlighting in Solr:

Changing Bits: Lucene's TokenStreams are actually graphs!
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html

Also take a look at 4th paragraph In his another article:

Changing Bits: A new Lucene highlighter is born
http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html

Currently, I can't figure out the possible cause of your problem unless I got 
spare time to test it on my own, which is not available these days (Got some 
projects to close)!

If you find the solution or workaround, pls. let us know. Good luck again!

Scott Chu,scott@udngroup.com
2015/10/27 
- Original Message - 
From: Scott Chu 
To: solr-user 
Date: 2015-10-27, 10:27:45
Subject: Re: Highlighting content field problem when using JiebaTokenizerFactory


Hi Edward,

Took a lot of time to see if there's anything can help you to define the 
cause of your problem. Maybe this might help you a bit: 

[SOLR-4722] Highlighter which generates a list of query term position(s) for 
each item in a list of documents, or returns null if highlighting is disabled. 
- AS...
https://issues.apache.org/jira/browse/SOLR-4722

This one is modified from FastVectorHighLighter, so ensure those 3 term* 
attributes are on.

Scott Chu,scott@udngroup.com
2015/10/27 
- Original Message - 
From: Zheng Lin Edwin Yeo 
To: solr-user 
Date: 2015-10-23, 10:42:32
Subject: Re: Highlighting content field problem when using JiebaTokenizerFactory


Hi Scott,

Thank you for your respond.

1. You said the problem only happens on "contents" field, so maybe there're
something wrong with the contents of that field. Doe it contain any special
thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions
something about HTML stripping will cause highlight problem. Maybe you can

try purify that fields to be closed to pure text and see if highlight comes
ok.
*A) I check that the SOLR-42 is mentioning about the
HTMLStripWhiteSpaceTokenizerFactory, which I'm not using. I believe that
tokenizer is already deprecated too. I've tried with all kinds of content
for rich-text documents, and all of them have the same problem.*

2. Maybe something imcompatible between JiebaTokenizer and Solr
highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
SmartChinese (I don't use this since I am dealing with Traditional Chinese

but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and

see if the problem goes away. However when I'm googling similar problem, I

saw you asked same question on August at Huaban/Jieba-analysis and somebody
said he also uses JiebaTokenizer but he doesn't have your problem. So I see
this could be less suspect.
*A) I was thinking about the incompatible issue too, as I previously
thought that JiebaTokenizer is optimised for Solr 4.x, so it may have issue
in 5.x. But the person from Hunban/Jieba-analysis said that he doesn't have
this problem in Solr 5.1. I also face the same problem in Solr 5.1, and
although I'm using Solr 5.3.0 now, the same problem persist. *

I'm looking at the indexing process too, to see if there's any problem
there. But just can't figure out why it only happen to JiebaTokenizer, and

it only happen for content field.


Regards,
Edwin


On 23 October 2015 at 09:41, Scott Chu <scott@udngroup.com> wrote:

> Hi Edwin,
>
> Since you've tested all my suggestions and the problem is still there, I

> can't think of anything wrong with your configuration. Now I can only
> suspect two things:
>
> 1. You said the problem only happens on "contents" field, so maybe
> there're something wrong with the contents of that field. Doe it contain

> any special thing in them, e.g. HTML tags or symbols. I recall SOLR-42
> mentions something about HTML stripping will cause highlight problem. Maybe
> you can try purify that fields to be closed to pure text and see if
> highlight comes ok.
>
> 2. Maybe something imcompatible between JiebaTokenizer and Solr
> highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
> SmartChinese (I don't use this since I am dealing with Traditional Chinese
> but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and
> see if the problem goes away. However when I'm googling similar problem, I
> saw you asked same question on August at Huaban/Jieba-analysis and somebody
> said he also uses JiebaTokenizer but he doesn't have your problem. So I see
> this could be less suspect.
>
> The theory of your problem could be something in indexing process causes

> wrong position info. for that field and when Solr do highlighting, it
> retrieves wrong position info. and mark wrong position of highlight target
> terms.
>
> Scott Chu,scott@udngroup.com
> 2015/10/23
>
> - Original Message -

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-26 Thread Scott Chu
Hi Edward,

Took a lot of time to see if there's anything can help you to define the 
cause of your problem. Maybe this might help you a bit: 

[SOLR-4722] Highlighter which generates a list of query term position(s) for 
each item in a list of documents, or returns null if highlighting is disabled. 
- AS...
https://issues.apache.org/jira/browse/SOLR-4722

This one is modified from FastVectorHighLighter, so ensure those 3 term* 
attributes are on.

Scott Chu,scott@udngroup.com
2015/10/27 
- Original Message - 
From: Zheng Lin Edwin Yeo 
To: solr-user 
Date: 2015-10-23, 10:42:32
Subject: Re: Highlighting content field problem when using JiebaTokenizerFactory


Hi Scott,

Thank you for your respond.

1. You said the problem only happens on "contents" field, so maybe there're
something wrong with the contents of that field. Doe it contain any special
thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions
something about HTML stripping will cause highlight problem. Maybe you can

try purify that fields to be closed to pure text and see if highlight comes
ok.
*A) I check that the SOLR-42 is mentioning about the
HTMLStripWhiteSpaceTokenizerFactory, which I'm not using. I believe that
tokenizer is already deprecated too. I've tried with all kinds of content
for rich-text documents, and all of them have the same problem.*

2. Maybe something imcompatible between JiebaTokenizer and Solr
highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
SmartChinese (I don't use this since I am dealing with Traditional Chinese

but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and

see if the problem goes away. However when I'm googling similar problem, I

saw you asked same question on August at Huaban/Jieba-analysis and somebody
said he also uses JiebaTokenizer but he doesn't have your problem. So I see
this could be less suspect.
*A) I was thinking about the incompatible issue too, as I previously
thought that JiebaTokenizer is optimised for Solr 4.x, so it may have issue
in 5.x. But the person from Hunban/Jieba-analysis said that he doesn't have
this problem in Solr 5.1. I also face the same problem in Solr 5.1, and
although I'm using Solr 5.3.0 now, the same problem persist. *

I'm looking at the indexing process too, to see if there's any problem
there. But just can't figure out why it only happen to JiebaTokenizer, and

it only happen for content field.


Regards,
Edwin


On 23 October 2015 at 09:41, Scott Chu <scott@udngroup.com> wrote:

> Hi Edwin,
>
> Since you've tested all my suggestions and the problem is still there, I

> can't think of anything wrong with your configuration. Now I can only
> suspect two things:
>
> 1. You said the problem only happens on "contents" field, so maybe
> there're something wrong with the contents of that field. Doe it contain

> any special thing in them, e.g. HTML tags or symbols. I recall SOLR-42
> mentions something about HTML stripping will cause highlight problem. Maybe
> you can try purify that fields to be closed to pure text and see if
> highlight comes ok.
>
> 2. Maybe something imcompatible between JiebaTokenizer and Solr
> highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
> SmartChinese (I don't use this since I am dealing with Traditional Chinese
> but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and
> see if the problem goes away. However when I'm googling similar problem, I
> saw you asked same question on August at Huaban/Jieba-analysis and somebody
> said he also uses JiebaTokenizer but he doesn't have your problem. So I see
> this could be less suspect.
>
> The theory of your problem could be something in indexing process causes

> wrong position info. for that field and when Solr do highlighting, it
> retrieves wrong position info. and mark wrong position of highlight target
> terms.
>
> Scott Chu,scott@udngroup.com
> 2015/10/23
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-22, 22:22:14
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Scott,
>
> Thank you for your response and suggestions.
>
> With respond to your questions, here are the answers:
>
> 1. I take a look at Jieba. It uses a dictionary and it seems to do a good
> job on CJK. I doubt this problem may be from those filters (note: I can
> understand you may use CJKWidthFilter to convert Japanese but doesn't
> understand why you use CJKBigramFilter and EdgeNGramFilter). Have you tried
> commenting out those filters, say leave only Jieba and StopFilter, and see
>
> if this problem disppears?
> *A) Yes, I have tried commenting out the other filters and only left wit

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-22 Thread Scott Chu
Hi solr-user,

Can't judge the cause on fast glimpse of your definition but some suggestions I 
can give:

1. I take a look at Jieba. It uses a dictionary and it seems to do a good job 
on CJK. I doubt this problem may be from those filters (note: I can understand 
you may use CJKWidthFilter to convert Japanese but doesn't understand why you 
use CJKBigramFilter and EdgeNGramFilter). Have you tried commenting out those 
filters, say leave only Jieba and StopFilter, and see if this problem disppears?

2.Does this problem occur only on Chinese search words? Does it happen on 
English search words?

3.To use FastVectorHighlighter, you seem to have to enable 3 term* parameters 
in field declaration? I see only one is enabled. Please refer to the answer in 
this stackoverflow question: 
http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only


Scott Chu,scott@udngroup.com
2015/10/22 
- Original Message - 
From: Zheng Lin Edwin Yeo 
To: solr-user 
Date: 2015-10-20, 12:04:11
Subject: Re: Highlighting content field problem when using JiebaTokenizerFactory


Hi Scott,

Here's my schema.xml for content and title, which uses text_chinese. The
problem only occurs in content, and not in title.


   


  
 






 
 





  
   


Here's my solrconfig.xml on the highlighting portion:

  
  
   explicit
   10
   json
   true
  text
  id, title, content_type, last_modified, url, score 

  on
   id, title, content, author, tag
  true
   true
   html
  200
true
signature
true
100
  
  


 
WORD
en
SG
 



Meanwhile, I'll take a look at the articles too.

Thank you.

Regards,
Edwin


On 20 October 2015 at 11:32, Scott Chu <scott@udngroup.com> wrote:

> Hi Edwin,
>
> I didn't use Jieba on Chinese (I use only CJK, very foundamental, I
> know) so I didn't experience this problem.
>
> I'd suggest you post your schema.xml so we can see how you define your
> content field and the field type it uses?
>
> In the mean time, refer to these articles, maybe the answer or workaround
> can be deducted from them.
>
> https://issues.apache.org/jira/browse/SOLR-3390
>
> http://qnalist.com/questions/661133/solr-is-highlighting-wrong-words
>
> http://qnalist.com/questions/667066/highlighting-marks-wrong-words
>
> Good luck!
>
>
>
>
> Scott Chu,scott@udngroup.com
> 2015/10/20
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-13, 17:04:29
> *Subject: *Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi,
>
> I'm trying to use the JiebaTokenizerFactory to index Chinese characters in
>
> Solr. It works fine with the segmentation when I'm using
> the Analysis function on the Solr Admin UI.
>
> However, when I tried to do the highlighting in Solr, it is not
> highlighting in the correct place. For example, when I search of 自然環境与企業本身,
> it highlight 認為自然環境与企業本身的
>
> Even when I search for English character like responsibility, it highlight
>  *responsibilit*y.
>
> Basically, the highlighting goes off by 1 character/space consistently.
>
> This problem only happens in content field, and not in any other fields.

> Does anyone knows what could be causing the issue?
>
> I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.
>
>
> Regards,
> Edwin
>
>
>
> -
> 未在此訊息中找到病毒。
> 已透過 AVG 檢查 - www.avg.com
> 版本: 2015.0.6140 / 病毒庫: 4447/10808 - 發佈日期: 10/12/15
>
>



-
未在此訊息中找到病毒。
已透過 AVG 檢查 - www.avg.com
版本: 2015.0.6172 / 病毒庫: 4447/10853 - 發佈日期: 10/19/15


Highlighting queries in parentheses

2015-10-22 Thread Michał Słomkowski

Hello,

recently I've deployed Solr 5.2.1 and I've observed the following issue:

My documents have two fields: id and text. Solr is configured to use 
FastVectorHighlighter (I've tried StandardHighlighter too, no 
difference). I've created the schema.xml, solrconfig.xml hasn't been 
changed in any way.


I have a following highlighting query: text:((foo AND bar) OR eggs). 
Let's say the documents contains only bar and eggs. Currently both of 
them are highlighted. However the desired behaviour is to highlight eggs 
only since (foo AND bar) is not true.


The query I send has following parameters:

'fl': 'id',
'hl': 'true',
'hl.requireFieldMatch': 'true',
'hl.fragListBuilder': 'single',
'hl.fragsize': '0',
'hl.fl': 'text',
'hl.mergeContiguous': 'true',
'hl.useFastVectorHighlighter': 'true',
'hl.q': 'text:((foo AND bar) OR eggs)'

I'd like to know what should I do to make it work as expected.





Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-22 Thread Scott Chu
Hi Edwin,

Since you've tested all my suggestions and the problem is still there, I can't 
think of anything wrong with your configuration. Now I can only suspect two 
things:

1. You said the problem only happens on "contents" field, so maybe there're 
something wrong with the contents of that field. Doe it contain any special 
thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions something 
about HTML stripping will cause highlight problem. Maybe you can try purify 
that fields to be closed to pure text and see if highlight comes ok.

2. Maybe something imcompatible between JiebaTokenizer and Solr highlighter. If 
you switch to other tokenizers, e.g. Standard, CJK, SmartChinese (I don't use 
this since I am dealing with Traditional Chinese but I see you are dealing with 
Simplified Chinese), or 3rd-party MMSeg and see if the problem goes away. 
However when I'm googling similar problem, I saw you asked same question on 
August at Huaban/Jieba-analysis and somebody said he also uses JiebaTokenizer 
but he doesn't have your problem. So I see this could be less suspect.

The theory of your problem could be something in indexing process causes wrong 
position info. for that field and when Solr do highlighting, it retrieves wrong 
position info. and mark wrong position of highlight target terms.

Scott Chu,scott@udngroup.com
2015/10/23 
- Original Message - 
From: Zheng Lin Edwin Yeo 
To: solr-user 
Date: 2015-10-22, 22:22:14
Subject: Re: Highlighting content field problem when using JiebaTokenizerFactory


Hi Scott,

Thank you for your response and suggestions.

With respond to your questions, here are the answers:

1. I take a look at Jieba. It uses a dictionary and it seems to do a good
job on CJK. I doubt this problem may be from those filters (note: I can
understand you may use CJKWidthFilter to convert Japanese but doesn't
understand why you use CJKBigramFilter and EdgeNGramFilter). Have you tried
commenting out those filters, say leave only Jieba and StopFilter, and see

if this problem disppears?
*A) Yes, I have tried commenting out the other filters and only left with
Jieba and StopFilter. The problem is still there.*

2.Does this problem occur only on Chinese search words? Does it happen on
English search words?
*A) Yes, the same problem occurs on English words. For example, when I
search for "word", it will highlight in this way:  word*

3.To use FastVectorHighlighter, you seem to have to enable 3 term*
parameters in field declaration? I see only one is enabled. Please refer to
the answer in this stackoverflow question:
http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only
*A) I have tried to enable all 3 terms in the FastVectorHighlighter too,
but the same problem persists as well.*


Regards,
Edwin


On 22 October 2015 at 16:25, Scott Chu <scott@udngroup.com> wrote:

> Hi solr-user,
>
> Can't judge the cause on fast glimpse of your definition but some
> suggestions I can give:
>
> 1. I take a look at Jieba. It uses a dictionary and it seems to do a good
> job on CJK. I doubt this problem may be from those filters (note: I can
> understand you may use CJKWidthFilter to convert Japanese but doesn't
> understand why you use CJKBigramFilter and EdgeNGramFilter). Have you tried
> commenting out those filters, say leave only Jieba and StopFilter, and see
> if this problem disppears?
>
> 2.Does this problem occur only on Chinese search words? Does it happen on
> English search words?
>
> 3.To use FastVectorHighlighter, you seem to have to enable 3 term*
> parameters in field declaration? I see only one is enabled. Please refer to
> the answer in this stackoverflow question:
> http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only
>
>
> Scott Chu,scott@udngroup.com
> 2015/10/22
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-20, 12:04:11
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Scott,
>
> Here's my schema.xml for content and title, which uses text_chinese. The

> problem only occurs in content, and not in title.
>
>  omitNorms="true" termVectors="true"/>
>  omitNorms="true" termVectors="true"/>
>
>
>  positionIncrementGap="100">
> 
>  segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>  maxGramSize="15"/>
> 
> 
> 
>  segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> 
> 
> 
>
>
> Here's my solrconfig.xml on the hi

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-22 Thread Zheng Lin Edwin Yeo
Hi Scott,

Thank you for your response and suggestions.

With respond to your questions, here are the answers:

1. I take a look at Jieba. It uses a dictionary and it seems to do a good
job on CJK. I doubt this problem may be from those filters (note: I can
understand you may use CJKWidthFilter to convert Japanese but doesn't
understand why you use CJKBigramFilter and EdgeNGramFilter). Have you tried
commenting out those filters, say leave only Jieba and StopFilter, and see
if this problem disppears?
*A) Yes, I have tried commenting out the other filters and only left with
Jieba and StopFilter. The problem is still there.*

2.Does this problem occur only on Chinese search words? Does it happen on
English search words?
*A) Yes, the same problem occurs on English words. For example, when I
search for "word", it will highlight in this way:  word*

3.To use FastVectorHighlighter, you seem to have to enable 3 term*
parameters in field declaration? I see only one is enabled. Please refer to
the answer in this stackoverflow question:
http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only
*A) I have tried to enable all 3 terms in the FastVectorHighlighter too,
but the same problem persists as well.*


Regards,
Edwin


On 22 October 2015 at 16:25, Scott Chu <scott@udngroup.com> wrote:

> Hi solr-user,
>
> Can't judge the cause on fast glimpse of your definition but some
> suggestions I can give:
>
> 1. I take a look at Jieba. It uses a dictionary and it seems to do a good
> job on CJK. I doubt this problem may be from those filters (note: I can
> understand you may use CJKWidthFilter to convert Japanese but doesn't
> understand why you use CJKBigramFilter and EdgeNGramFilter). Have you tried
> commenting out those filters, say leave only Jieba and StopFilter, and see
> if this problem disppears?
>
> 2.Does this problem occur only on Chinese search words? Does it happen on
> English search words?
>
> 3.To use FastVectorHighlighter, you seem to have to enable 3 term*
> parameters in field declaration? I see only one is enabled. Please refer to
> the answer in this stackoverflow question:
> http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only
>
>
> Scott Chu,scott@udngroup.com
> 2015/10/22
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-20, 12:04:11
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Scott,
>
> Here's my schema.xml for content and title, which uses text_chinese. The
> problem only occurs in content, and not in title.
>
>  omitNorms="true" termVectors="true"/>
> omitNorms="true" termVectors="true"/>
>
>
>positionIncrementGap="100">
>  
>   segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>  maxGramSize="15"/>
> 
>  
>  
>   segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> 
>   
>
>
>
> Here's my solrconfig.xml on the highlighting portion:
>
>   
>   
>explicit
>10
>json
>true
>   text
>   id, title, content_type, last_modified, url, score 
>
>   on
>id, title, content, author, tag
>   true
>true
>html
>   200
> true
> signature
> true
> 100
>   
>   
>
>  class="solr.highlight.BreakIteratorBoundaryScanner">
>  
> WORD
> en
> SG
>  
> 
>
>
> Meanwhile, I'll take a look at the articles too.
>
> Thank you.
>
> Regards,
> Edwin
>
>
> On 20 October 2015 at 11:32, Scott Chu <scott@udngroup.com
> <+scott@udngroup.com>> wrote:
>
> > Hi Edwin,
> >
> > I didn't use Jieba on Chinese (I use only CJK, very foundamental, I
> > know) so I didn't experience this problem.
> >
> > I'd suggest you post your schema.xml so we can see how you define your
> > content field and the field type it uses?
> >
> > In the mean time, refer to these articles, maybe the answer or workaround
> > can be deducted from them.
> >
> > https://issues.apache.org/jira/browse/SOLR-3390
> >
> > http://qnalist.com/questions/661133/solr-is-highlighting-wrong-words
> >
> > http://qnalist.com/questions/667066/highlighting-marks-wrong-words
> >
> > Good luck!
> >
> >
> >
> >
> > Scott Chu,scott.

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-22 Thread Zheng Lin Edwin Yeo
Hi Scott,

Thank you for your respond.

1. You said the problem only happens on "contents" field, so maybe there're
something wrong with the contents of that field. Doe it contain any special
thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions
something about HTML stripping will cause highlight problem. Maybe you can
try purify that fields to be closed to pure text and see if highlight comes
ok.
*A) I check that the SOLR-42 is mentioning about the
HTMLStripWhiteSpaceTokenizerFactory, which I'm not using. I believe that
tokenizer is already deprecated too. I've tried with all kinds of content
for rich-text documents, and all of them have the same problem.*

2. Maybe something imcompatible between JiebaTokenizer and Solr
highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
SmartChinese (I don't use this since I am dealing with Traditional Chinese
but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and
see if the problem goes away. However when I'm googling similar problem, I
saw you asked same question on August at Huaban/Jieba-analysis and somebody
said he also uses JiebaTokenizer but he doesn't have your problem. So I see
this could be less suspect.
*A) I was thinking about the incompatible issue too, as I previously
thought that JiebaTokenizer is optimised for Solr 4.x, so it may have issue
in 5.x. But the person from Hunban/Jieba-analysis said that he doesn't have
this problem in Solr 5.1. I also face the same problem in Solr 5.1, and
although I'm using Solr 5.3.0 now, the same problem persist. *

I'm looking at the indexing process too, to see if there's any problem
there. But just can't figure out why it only happen to JiebaTokenizer, and
it only happen for content field.


Regards,
Edwin


On 23 October 2015 at 09:41, Scott Chu <scott@udngroup.com> wrote:

> Hi Edwin,
>
> Since you've tested all my suggestions and the problem is still there, I
> can't think of anything wrong with your configuration. Now I can only
> suspect two things:
>
> 1. You said the problem only happens on "contents" field, so maybe
> there're something wrong with the contents of that field. Doe it contain
> any special thing in them, e.g. HTML tags or symbols. I recall SOLR-42
> mentions something about HTML stripping will cause highlight problem. Maybe
> you can try purify that fields to be closed to pure text and see if
> highlight comes ok.
>
> 2. Maybe something imcompatible between JiebaTokenizer and Solr
> highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
> SmartChinese (I don't use this since I am dealing with Traditional Chinese
> but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and
> see if the problem goes away. However when I'm googling similar problem, I
> saw you asked same question on August at Huaban/Jieba-analysis and somebody
> said he also uses JiebaTokenizer but he doesn't have your problem. So I see
> this could be less suspect.
>
> The theory of your problem could be something in indexing process causes
> wrong position info. for that field and when Solr do highlighting, it
> retrieves wrong position info. and mark wrong position of highlight target
> terms.
>
> Scott Chu,scott@udngroup.com
> 2015/10/23
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-22, 22:22:14
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Scott,
>
> Thank you for your response and suggestions.
>
> With respond to your questions, here are the answers:
>
> 1. I take a look at Jieba. It uses a dictionary and it seems to do a good
> job on CJK. I doubt this problem may be from those filters (note: I can
> understand you may use CJKWidthFilter to convert Japanese but doesn't
> understand why you use CJKBigramFilter and EdgeNGramFilter). Have you tried
> commenting out those filters, say leave only Jieba and StopFilter, and see
>
> if this problem disppears?
> *A) Yes, I have tried commenting out the other filters and only left with
> Jieba and StopFilter. The problem is still there.*
>
> 2.Does this problem occur only on Chinese search words? Does it happen on
> English search words?
> *A) Yes, the same problem occurs on English words. For example, when I
> search for "word", it will highlight in this way:  word*
>
> 3.To use FastVectorHighlighter, you seem to have to enable 3 term*
> parameters in field declaration? I see only one is enabled. Please refer to
> the answer in this stackoverflow question:
>
> http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only
> *A) I have tried to enable all 3 terms in the FastVe

Highlighting queries in parentheses

2015-10-22 Thread Michał Słomkowski

Hello,

recently I've deployed Solr 5.2.1 and I've observed the following issue:

My documents have two fields: id and text. Solr is configured to use 
FastVectorHighlighter (I've tried StandardHighlighter too, no 
difference). I've created the schema.xml, solrconfig.xml hasn't been 
changed in any way.


I have a following highlighting query: text:((foo AND bar) OR eggs). 
Let's say the documents contains only bar and eggs. Currently both of 
them are highlighted. However the desired behaviour is to highlight eggs 
only since (foo AND bar) is not true.


The query I send has following parameters:

'fl': 'id',
'hl': 'true',
'hl.requireFieldMatch': 'true',
'hl.fragListBuilder': 'single',
'hl.fragsize': '0',
'hl.fl': 'text',
'hl.mergeContiguous': 'true',
'hl.useFastVectorHighlighter': 'true',
'hl.q': 'text:((foo AND bar) OR eggs)'

I'd like to know what should I do to make it work as expected.





Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Scott Stults
Edwin,

Try setting hl.bs.language and hl.bs.country in your request or
requestHandler:

https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter#FastVectorHighlighter-UsingBoundaryScannerswiththeFastVectorHighlighter


-Scott

On Tue, Oct 13, 2015 at 5:04 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi,
>
> I'm trying to use the JiebaTokenizerFactory to index Chinese characters in
> Solr. It works fine with the segmentation when I'm using
> the Analysis function on the Solr Admin UI.
>
> However, when I tried to do the highlighting in Solr, it is not
> highlighting in the correct place. For example, when I search of 自然环境与企业本身,
> it highlight 认为自然环境与企业本身的
>
> Even when I search for English character like  responsibility, it highlight
>   *responsibilit*y.
>
> Basically, the highlighting goes off by 1 character/space consistently.
>
> This problem only happens in content field, and not in any other fields.
> Does anyone knows what could be causing the issue?
>
> I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.
>
>
> Regards,
> Edwin
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Zheng Lin Edwin Yeo
Hi Scott,

Here's my schema.xml for content and title, which uses text_chinese. The
problem only occurs in content, and not in title.


   


  
 






 
 





  
   


Here's my solrconfig.xml on the highlighting portion:

  
  
   explicit
   10
   json
   true
  text
  id, title, content_type, last_modified, url, score 

  on
   id, title, content, author, tag
  true
   true
   html
  200
true
signature
true
100
  
  


 
WORD
en
SG
 



Meanwhile, I'll take a look at the articles too.

Thank you.

Regards,
Edwin


On 20 October 2015 at 11:32, Scott Chu <scott@udngroup.com> wrote:

> Hi Edwin,
>
> I didn't use Jieba on Chinese (I use only CJK, very foundamental, I
> know) so I didn't experience this problem.
>
> I'd suggest you post your schema.xml so we can see how you define your
> content field and the field type it uses?
>
> In the mean time, refer to these articles, maybe the answer or workaround
> can be deducted from them.
>
> https://issues.apache.org/jira/browse/SOLR-3390
>
> http://qnalist.com/questions/661133/solr-is-highlighting-wrong-words
>
> http://qnalist.com/questions/667066/highlighting-marks-wrong-words
>
> Good luck!
>
>
>
>
> Scott Chu,scott@udngroup.com
> 2015/10/20
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> *To: *solr-user <solr-user@lucene.apache.org>
> *Date: *2015-10-13, 17:04:29
> *Subject: *Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi,
>
> I'm trying to use the JiebaTokenizerFactory to index Chinese characters in
>
> Solr. It works fine with the segmentation when I'm using
> the Analysis function on the Solr Admin UI.
>
> However, when I tried to do the highlighting in Solr, it is not
> highlighting in the correct place. For example, when I search of 自然環境与企業本身,
> it highlight 認為自然環境与企業本身的
>
> Even when I search for English character like responsibility, it highlight
>   *responsibilit*y.
>
> Basically, the highlighting goes off by 1 character/space consistently.
>
> This problem only happens in content field, and not in any other fields.
> Does anyone knows what could be causing the issue?
>
> I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.
>
>
> Regards,
> Edwin
>
>
>
> -
> 未在此訊息中找到病毒。
> 已透過 AVG 檢查 - www.avg.com
> 版本: 2015.0.6140 / 病毒庫: 4447/10808 - 發佈日期: 10/12/15
>
>


Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Zheng Lin Edwin Yeo
Hi Scott,

Thank you for your reply.

I've tried to set that and also tried changing to Fast Vector Highlighter,
but it isn't working as well. I got the same highlighting results as
previously.

Regards,
Edwin


On 19 October 2015 at 23:56, Scott Stults <sstu...@opensourceconnections.com
> wrote:

> Edwin,
>
> Try setting hl.bs.language and hl.bs.country in your request or
> requestHandler:
>
>
> https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter#FastVectorHighlighter-UsingBoundaryScannerswiththeFastVectorHighlighter
>
>
> -Scott
>
> On Tue, Oct 13, 2015 at 5:04 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > I'm trying to use the JiebaTokenizerFactory to index Chinese characters
> in
> > Solr. It works fine with the segmentation when I'm using
> > the Analysis function on the Solr Admin UI.
> >
> > However, when I tried to do the highlighting in Solr, it is not
> > highlighting in the correct place. For example, when I search of
> 自然环境与企业本身,
> > it highlight 认为自然环境与企业本身的
> >
> > Even when I search for English character like  responsibility, it
> highlight
> >   *responsibilit*y.
> >
> > Basically, the highlighting goes off by 1 character/space consistently.
> >
> > This problem only happens in content field, and not in any other fields.
> > Does anyone knows what could be causing the issue?
> >
> > I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com
>


Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Scott Chu
Hi Edwin,

I didn't use Jieba on Chinese (I use only CJK, very foundamental, I know) so I 
didn't experience this problem. 

I'd suggest you post your schema.xml so we can see how you define your content 
field and the field type it uses?

In the mean time, refer to these articles, maybe the answer or workaround can 
be deducted from them.

https://issues.apache.org/jira/browse/SOLR-3390

http://qnalist.com/questions/661133/solr-is-highlighting-wrong-words

http://qnalist.com/questions/667066/highlighting-marks-wrong-words

Good luck!




Scott Chu,scott@udngroup.com
2015/10/20 
- Original Message - 
From: Zheng Lin Edwin Yeo 
To: solr-user 
Date: 2015-10-13, 17:04:29
Subject: Highlighting content field problem when using JiebaTokenizerFactory


Hi,

I'm trying to use the JiebaTokenizerFactory to index Chinese characters in

Solr. It works fine with the segmentation when I'm using
the Analysis function on the Solr Admin UI.

However, when I tried to do the highlighting in Solr, it is not
highlighting in the correct place. For example, when I search of 自然環境与企業本身,
it highlight 認為自然環境与企業本身的

Even when I search for English character like responsibility, it highlight
  *responsibilit*y.

Basically, the highlighting goes off by 1 character/space consistently.

This problem only happens in content field, and not in any other fields.
Does anyone knows what could be causing the issue?

I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.


Regards,
Edwin



-
未在此訊息中找到病毒。
已透過 AVG 檢查 - www.avg.com
版本: 2015.0.6140 / 病毒庫: 4447/10808 - 發佈日期: 10/12/15


Highlighting content field problem when using JiebaTokenizerFactory

2015-10-13 Thread Zheng Lin Edwin Yeo
Hi,

I'm trying to use the JiebaTokenizerFactory to index Chinese characters in
Solr. It works fine with the segmentation when I'm using
the Analysis function on the Solr Admin UI.

However, when I tried to do the highlighting in Solr, it is not
highlighting in the correct place. For example, when I search of 自然环境与企业本身,
it highlight 认为自然环境与企业本身的

Even when I search for English character like  responsibility, it highlight
  *responsibilit*y.

Basically, the highlighting goes off by 1 character/space consistently.

This problem only happens in content field, and not in any other fields.
Does anyone knows what could be causing the issue?

I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.


Regards,
Edwin


Re: Highlighting tag is not showing occasionally

2015-10-09 Thread Zheng Lin Edwin Yeo
I found that it could be due to the EdgeNGramFilterFactory. This issue
didn't happen if I did not apply the EdgeNGramFilterFactory filter for my
fieldType.

But does anyone knows why using the EdgeNGramFilterFactory will cause this
problem?

Regards,
Edwin


On 7 October 2015 at 17:46, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi,
>
> Has anyone face the problem of when using highlighting, sometimes there
> are results which are returned, but there is no highlighting to the result
> (ie: no  tag).
>
> I found that there is a match in another field which I did not include in
> my hl.fl parameters when I do fl=*, but that same word acutally does appear
> in content.
>
> Would like to find out, why sometimes there is a match in content, but it
> is not highlighted (the word is not in the stopword list)? Did I make any
> mistakes in my configuration?
>
> I've include my highlighting request handler from solrconfig.xml here.
>
> 
> 
> explicit
> 10
> json
> true
> text
> id, title, content_type, last_modified, url, score 
>
> on
> id, title, content, author, tag
>   true
> true
> html
> 200
>
> true
> signature
> true
> 100
> 
> 
>
>
> Regards,
> Edwin
>


Highlighting tag is not showing occasionally

2015-10-07 Thread Zheng Lin Edwin Yeo
Hi,

Has anyone face the problem of when using highlighting, sometimes there are
results which are returned, but there is no highlighting to the result (ie:
no  tag).

I found that there is a match in another field which I did not include in
my hl.fl parameters when I do fl=*, but that same word acutally does appear
in content.

Would like to find out, why sometimes there is a match in content, but it
is not highlighted (the word is not in the stopword list)? Did I make any
mistakes in my configuration?

I've include my highlighting request handler from solrconfig.xml here.



explicit
10
json
true
text
id, title, content_type, last_modified, url, score 

on
id, title, content, author, tag
  true
true
html
200

true
signature
true
100




Regards,
Edwin


Re: highlighting

2015-10-02 Thread Upayavira
In the end, in most open source projects, people implement that which
they need themselves, and offer it back to the community in the hope
that it will help others too.

If you need this, then I'd encourage you to look at the source
highlighting component and see if you can see how it might be done.

It would then be great to put your thoughts and ideas into a JIRA
ticket.

Upayavira

On Thu, Oct 1, 2015, at 11:31 PM, Teague James wrote:
> Hi everyone!
> 
> Pardon if it's not proper etiquette to chime in, but that feature would
> solve some issues I have with my app for the same reason. We are using
> markers now and it is very clunky - particularly with phrases and certain
> special characters. I would love to see this feature too Mark! For what
> it's worth - up vote. Thanks!
> 
> Cheers!
> 
> -Teague James
> 
> > On Oct 1, 2015, at 6:12 PM, Koji Sekiguchi <koji.sekigu...@rondhuit.com> 
> > wrote:
> > 
> > Hi Mark,
> > 
> > I think I saw similar requirement recently in mailing list. The feature 
> > sounds reasonable to me.
> > 
> > > If not, how do I go about posting this as a feature request?
> > 
> > JIRA can be used for the purpose, but there is no guarantee that the 
> > feature is implemented. :(
> > 
> > Koji
> > 
> >> On 2015/10/01 20:07, Mark Fenbers wrote:
> >> Yeah, I thought about using markers, but then I'd have to search the the 
> >> text for the markers to
> >> determine the locations.  This is a clunky way of getting the results I 
> >> want, and it would save two
> >> steps if Solr merely had an option to return a start/length array (of what 
> >> should be highlighted) in
> >> the original string rather than returning an altered string with tags 
> >> inserted.
> >> 
> >> Mark
> >> 
> >>> On 9/29/2015 7:04 AM, Upayavira wrote:
> >>> You can change the strings that are inserted into the text, and could
> >>> place markers that you use to identify the start/end of highlighting
> >>> elements. Does that work?
> >>> 
> >>> Upayavira
> >>> 
> >>>> On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote:
> >>>> Greetings!
> >>>> 
> >>>> I have highlighting turned on in my Solr searches, but what I get back
> >>>> is  tags surrounding the found term.  Since I use a SWT StyledText
> >>>> widget to display my search results, what I really want is the offset
> >>>> and length of each found term, so that I can highlight it in my own way
> >>>> without HTML.  Is there a way to configure Solr to do that?  I couldn't
> >>>> find it.  If not, how do I go about posting this as a feature request?
> >>>> 
> >>>> Thanks,
> >>>> Mark
> > 


Re: highlighting

2015-10-01 Thread Mark Fenbers
Yeah, I thought about using markers, but then I'd have to search the the 
text for the markers to determine the locations.  This is a clunky way 
of getting the results I want, and it would save two steps if Solr 
merely had an option to return a start/length array (of what should be 
highlighted) in the original string rather than returning an altered 
string with tags inserted.


Mark

On 9/29/2015 7:04 AM, Upayavira wrote:

You can change the strings that are inserted into the text, and could
place markers that you use to identify the start/end of highlighting
elements. Does that work?

Upayavira

On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote:

Greetings!

I have highlighting turned on in my Solr searches, but what I get back
is  tags surrounding the found term.  Since I use a SWT StyledText
widget to display my search results, what I really want is the offset
and length of each found term, so that I can highlight it in my own way
without HTML.  Is there a way to configure Solr to do that?  I couldn't
find it.  If not, how do I go about posting this as a feature request?

Thanks,
Mark




Re: highlighting

2015-10-01 Thread Koji Sekiguchi

Hi Mark,

I think I saw similar requirement recently in mailing list. The feature sounds 
reasonable to me.

> If not, how do I go about posting this as a feature request?

JIRA can be used for the purpose, but there is no guarantee that the feature is 
implemented. :(

Koji

On 2015/10/01 20:07, Mark Fenbers wrote:

Yeah, I thought about using markers, but then I'd have to search the the text 
for the markers to
determine the locations.  This is a clunky way of getting the results I want, 
and it would save two
steps if Solr merely had an option to return a start/length array (of what 
should be highlighted) in
the original string rather than returning an altered string with tags inserted.

Mark

On 9/29/2015 7:04 AM, Upayavira wrote:

You can change the strings that are inserted into the text, and could
place markers that you use to identify the start/end of highlighting
elements. Does that work?

Upayavira

On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote:

Greetings!

I have highlighting turned on in my Solr searches, but what I get back
is  tags surrounding the found term.  Since I use a SWT StyledText
widget to display my search results, what I really want is the offset
and length of each found term, so that I can highlight it in my own way
without HTML.  Is there a way to configure Solr to do that?  I couldn't
find it.  If not, how do I go about posting this as a feature request?

Thanks,
Mark






Re: highlighting

2015-10-01 Thread Teague James
Hi everyone!

Pardon if it's not proper etiquette to chime in, but that feature would solve 
some issues I have with my app for the same reason. We are using markers now 
and it is very clunky - particularly with phrases and certain special 
characters. I would love to see this feature too Mark! For what it's worth - up 
vote. Thanks!

Cheers!

-Teague James

> On Oct 1, 2015, at 6:12 PM, Koji Sekiguchi <koji.sekigu...@rondhuit.com> 
> wrote:
> 
> Hi Mark,
> 
> I think I saw similar requirement recently in mailing list. The feature 
> sounds reasonable to me.
> 
> > If not, how do I go about posting this as a feature request?
> 
> JIRA can be used for the purpose, but there is no guarantee that the feature 
> is implemented. :(
> 
> Koji
> 
>> On 2015/10/01 20:07, Mark Fenbers wrote:
>> Yeah, I thought about using markers, but then I'd have to search the the 
>> text for the markers to
>> determine the locations.  This is a clunky way of getting the results I 
>> want, and it would save two
>> steps if Solr merely had an option to return a start/length array (of what 
>> should be highlighted) in
>> the original string rather than returning an altered string with tags 
>> inserted.
>> 
>> Mark
>> 
>>> On 9/29/2015 7:04 AM, Upayavira wrote:
>>> You can change the strings that are inserted into the text, and could
>>> place markers that you use to identify the start/end of highlighting
>>> elements. Does that work?
>>> 
>>> Upayavira
>>> 
>>>> On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote:
>>>> Greetings!
>>>> 
>>>> I have highlighting turned on in my Solr searches, but what I get back
>>>> is  tags surrounding the found term.  Since I use a SWT StyledText
>>>> widget to display my search results, what I really want is the offset
>>>> and length of each found term, so that I can highlight it in my own way
>>>> without HTML.  Is there a way to configure Solr to do that?  I couldn't
>>>> find it.  If not, how do I go about posting this as a feature request?
>>>> 
>>>> Thanks,
>>>> Mark
> 


Re: highlighting

2015-09-29 Thread Upayavira
You can change the strings that are inserted into the text, and could
place markers that you use to identify the start/end of highlighting
elements. Does that work?

Upayavira

On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote:
> Greetings!
> 
> I have highlighting turned on in my Solr searches, but what I get back 
> is  tags surrounding the found term.  Since I use a SWT StyledText 
> widget to display my search results, what I really want is the offset 
> and length of each found term, so that I can highlight it in my own way 
> without HTML.  Is there a way to configure Solr to do that?  I couldn't 
> find it.  If not, how do I go about posting this as a feature request?
> 
> Thanks,
> Mark


highlighting

2015-09-28 Thread Mark Fenbers

Greetings!

I have highlighting turned on in my Solr searches, but what I get back 
is  tags surrounding the found term.  Since I use a SWT StyledText 
widget to display my search results, what I really want is the offset 
and length of each found term, so that I can highlight it in my own way 
without HTML.  Is there a way to configure Solr to do that?  I couldn't 
find it.  If not, how do I go about posting this as a feature request?


Thanks,
Mark


Re: Help storing + highlighting search results in PDF newspapers

2015-09-11 Thread Erick Erickson
Yeah, there are a lot of moving parts to connect

Let's see the highlight configuration you're
using. Should be in your solrconfig.xml file for the request
handler you're using. Are you calling out the field you want
highlighted in the hl.fl list?


Unfortunately getting specific fields populated is tricky since
Tika has to deal with all the file formats which store
meta-data in various ways, i.e. Word is completely
unrelated to PDF which is unrelated to (pick your
file format here). But we can deal with that after you
get some basic highlighting done.

And I tend to prefer to do my Tika parsing on a client,
it gives me more control over what happens and moves
the burden off the Solr server, here's a place to get
started if you want to pursue that avenue.

http://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick


On Fri, Sep 11, 2015 at 8:47 AM, Colin 't Hart <co...@sharpheart.org> wrote:
> Hi,
>
> I'm having trouble negotiating the steep Solr learning curve...
>
> 1. I'm trying to store scanned and OCRed newspapers in PDF format into Solr
> for full-text searching.
> I've tried most (all?) of the examples and sample configurations that come
> with Solr 5.3.0 and I can upload the PDFs.
> Searching works, but for the life of me I can't get highlights in the
> results.
>
> I tried setting the "store" attribute of the "_text_" and/or "content"
> fields to "true" but that didn't help -- just increased the size of the
> query response -- and lots of PDF data appeared in the response instead of
> just the text -- but the "highlighting" section of the response was still
> virtually empty (just lists matching documents, but no highlighted text
> fragments).
>
>
> Can someone point me in the direction of a sample config that will work?
>
>
> 2. After that's working I'd like to trim this down to a minimal schema with
> just
>
> * title
> * date
> * volume
> * number
> * URL (the PDFs themselves will be made available online for viewing using
> the same viewer.js that's embedded in Firefox)
>
> as metadata (as well as the required metadata such as id and _version_).
>
> I want to extract these metadata fields from the filenames -- I presume
> that's also possible? Can someone point me to how I would go about doing
> this too?
>
>
> 3. The newspapers are in Swedish. I've found the Swedish stopwords list;
> are there any other dictionaries etc available to assist with queries where
> words have different forms for eg plurals, eg "flicka" (girl), "flickor"
> (girls)?
>
>
>
> Much thanks in advance!
>
> Regards,
>
> Colin


Help storing + highlighting search results in PDF newspapers

2015-09-11 Thread Colin 't Hart
Hi,

I'm having trouble negotiating the steep Solr learning curve...

1. I'm trying to store scanned and OCRed newspapers in PDF format into Solr
for full-text searching.
I've tried most (all?) of the examples and sample configurations that come
with Solr 5.3.0 and I can upload the PDFs.
Searching works, but for the life of me I can't get highlights in the
results.

I tried setting the "store" attribute of the "_text_" and/or "content"
fields to "true" but that didn't help -- just increased the size of the
query response -- and lots of PDF data appeared in the response instead of
just the text -- but the "highlighting" section of the response was still
virtually empty (just lists matching documents, but no highlighted text
fragments).


Can someone point me in the direction of a sample config that will work?


2. After that's working I'd like to trim this down to a minimal schema with
just

* title
* date
* volume
* number
* URL (the PDFs themselves will be made available online for viewing using
the same viewer.js that's embedded in Firefox)

as metadata (as well as the required metadata such as id and _version_).

I want to extract these metadata fields from the filenames -- I presume
that's also possible? Can someone point me to how I would go about doing
this too?


3. The newspapers are in Swedish. I've found the Swedish stopwords list;
are there any other dictionaries etc available to assist with queries where
words have different forms for eg plurals, eg "flicka" (girl), "flickor"
(girls)?



Much thanks in advance!

Regards,

Colin


Highlighting snippets truncated when matching large number of indexed documents

2015-09-02 Thread hsharma mailinglists
Hi there,

I'm observing that the snippets being returned in the highlighting
section of the response are getting truncated. However, this behavior
is being seen only when the query matches a large number of documents
and the results requested are near the end of the Solr-returned
overall results list.

I'm using Solr 5.2.1 (Java 1.8.0_51) and my document is defined in
terms of the following two fields, as specified in the schema file:

  

  
  

  
  

  
  
  


  
  

  

  
  
  

Hence, the fields of interest are called "name" and "name_edgengram".

I search for the word 'data' and Solr indicates that there are 565
results. I retrieve 10 results at a time, and the highlighting works
fine till I make a request to Solr for getting 10 results starting at
number 490. The http request made is >>

http://localhost:8983/solr/mycore/select?q=name%3A%22data%22+OR+name_edgengram%3A%22data%22=490=id%2Cname=json=true=true=name%2Cname_edgengram=%3Cem%3E=%3C%2Fem%3E=true=0

My highlighting parameters are specified at query-time. I get the
following json response from Solr >>


{
  "responseHeader": {
"status": 0,
"QTime": 76,
"params": {
  "q": "name:\"data\" OR name_edgengram:\"data\"",
  "hl": "true",
  "hl.simple.post": "",
  "indent": "true",
  "fl": "id,name",
  "start": "490",
  "hl.fragSize": "0",
  "hl.fl": "name,name_edgengram",
  "wt": "json",
  "hl.simple.pre": "",
  "hl.highlightMultiTerm": "true"
}
  },
  "response": {
"numFound": 565,
"start": 490,
"docs": [
  {
"name":
"software/information-management/cq-image-jsp-/content/sascom/en_us/software/data-management/jcr:content/par/tabctrl_d036/tab-2-tabImage",
"id": "p-798-pn9058800-uu303582258"
  },
  {
"name":
"en_us/whitepapers/how-to-advance-your-data-mining-predictive-analytics-with-modern-techniques-106219.html",
"id": "p-798-pn9677905-uu304125128"
  },
  {
"name":
"en_us/insights/cq-image-jsp-/content/sascom/en_us/insights/data-management/jcr:content/par/tabctrl_4a63/tab-0/styledcontainer_231d/par/styledcontainer_3919/par/image_a747",
"id": "p-798-pn9058609-uu303582055"
  },
  {
"name":
"software/smb/cq-textimage-jsp-/content/sascom/en_us/software/small-midsize-business/desktop-data-mining/jcr:content/par/styledcontainer_6b5c/par/contentcarousel_ea6/cntntcarousel/textimage_e28",
"id": "p-798-pn9058629-uu303582076"
  },
  {
"name":
"en_us/whitepapers/harvard-business-review-the-evolution-of-decision-making-how-leading-organizations-are-adopting-a-data-driven-culture105998.html",
    "id": "p-798-pn9677481-uu297657017"
  },
  {
"id": "kw-798-3075204",
"name": "mpp database"
  },
  {
"id": "kw-798-951983",
"name": "In-Database Analytics"
  },
  {
"id": "kw-798-3075206",
"name": "in-memory database"
  },
  {
"name": "software/data_mining/",
"id": "p-798-pn30459505-uu376483712"
  },
  {
"name": "rnd/datavisualization/",
"id": "p-798-pn68559-uu524630"
  }
]
  },
  "highlighting": {
"p-798-pn9058800-uu303582258": {
  "name": [

"software/information-management/cq-image-jsp-/content/sascom/en_us/software/data"
  ],
  "name_edgengram": [

"software/information-management/cq-image-jsp-/content/sascom/en_us/software/data"
  ]
},
"p-798-pn9677905-uu304125128": {
  "name": [

"en_us/whitepapers/how-to-advance-your-data-mining-predictive-analytics-with"
  ],
  "name_edgengram": [

"en_us/whitepapers/how-to-advance-your-data-mining-predictive-analytics-with"
  ]
},
"p-798-pn9058609-uu303582055": {
  "name": [

"en_us/insights/cq-image-jsp-/content/sascom/en_us/insights/data-management"
  ],
  "name_edgengram": [

"en_us/insights/cq-image-jsp-/content/sascom

Solr having problems with highlighting when using Jieba anaylzer

2015-08-19 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Jieba analyser to index Chinese characters in the Solr. It works
fine with the segmentation when using the Anaylsis on the Solr Admin UI.

However, when I tried to do highlighting in Solr, it is not highlighting in
the correct place. For example, when I search for 自然环境与企业本身, it highlight
认em为自然环/emem境/emem与企/emem业本/em身的

Even when I search English character responsibility, it highlight  em
*responsibilitem*y.

I'm using jieba-analysis-1.0.0, Solr 5.2.1 and Lucene 5.1.0

Regards,
Edwin


Re: Highlighting, all matches show empty {}

2015-08-12 Thread Scott Derrick
I think the highlighter is actually running, but I'm not getting the 
results??


with this request

http://localhost:8983/solr/mbepp/select?q=concordfl=accession%2C+title%2C+author%2C+datewt=jsonindent=truehl=truehl.fl=*


I get this response

{
  responseHeader:{
status:0,
QTime:3,
params:{
  q:concord,
  hl:true,
  indent:true,
  fl:accession, title, author, date,
  hl.fl:*,
  wt:json}},
  response:{numFound:3,start:0,docs:[
  {
date:1890-02-26,
author:Mary Baker Eddy,
accession:L13943,
title:[Mary Baker Eddy to Joseph E. Adams,]},
  {
date:1896-01-13,
author:Mary Baker Eddy,
accession:L03453,
title:[Mary Baker Eddy to Ira O. Knapp,]},
  {
date:1902-06-15,
author:Mary Baker Eddy,
accession:A10145,
title:[Message of the Pastor Emeritus to The First Church of 
Christ, Scientist, Boston, Mass., June 15, 1902]}]

  },
  highlighting:{

/home/scott/workspace/mbel-work/tei2html/build/web/L13943/L13943.html:{},

/home/scott/workspace/mbel-work/tei2html/build/web/L03453/L03453.html:{},

/home/scott/workspace/mbel-work/tei2html/build/web/A10145/A10145.html:{}}}

When I ran the request.
In the admin plubins/Stats I set Watch Changes before processing the 
request.  Highlighting showed 2 changes, the gapFragmenter and HTMLFormatter


here are the reported changes

org.apache.solr.highlight.GapFragmenter
class: org.apache.solr.highlight.GapFragmenter
version: 5.2.1
description: GapFragmenter
stats: requests: Was: 117, Now: 156, Delta: 39

org.apache.solr.highlight.HtmlFormatter
class: org.apache.solr.highlight.HtmlFormatter
version:5.2.1
description:HtmlFormatter
stats: requests: Was: 117, Now: 156, Delta: 39

Looks to me like there were 39 fragments or something processed, yet you 
can see above the highlights are empty {}???


though all the the other libraries in the highlighter showed no changes.

which are these...

org.apache.solr.highlight.BreakIteratorBoundaryScanner
org.apache.solr.highlight.HtmlEncoder
org.apache.solr.highlight.RegexFragmenter
org.apache.solr.highlight.ScoreOrderFragmentsBuilder
org.apache.solr.highlight.SimpleBoundaryScanner
org.apache.solr.highlight.SimpleFragListBuilder
org.apache.solr.highlight.SingleFragListBuilder
org.apache.solr.highlight.WeightedFragListBuilder


Scott

 Original Message 
Subject: Highlighting, all matches show empty {}
From: Scott Derrick sc...@tnstaafl.net
To: solr-user@lucene.apache.org
Date: 08/12/2015 08:20 AM


Tried submitting a filed for hl.fl still empty {}

here are the query terms

responseHeader: {
 status: 0,
 QTime: 8,
 params: {
   q: mary or calvin,
   hl: true,
   hl.simple.post: /em,
   indent: true,
   fl: accession, title, author, date,
   hl.fl: *,
   wt: json,
   hl.simple.pre: em,
   _: 1439388969240
 }

here is one of the responses, there were 135

{
 date: 1886-07-06,
 author: Mary Baker Eddy,
 accession: L02634,
 title: [
   Mary Baker Eddy to Josephine C. Woodbury, July 6, 1886
 ]
},

here is the highlight section listing the first 10 matches, still empty {}

highlighting: {

/home/scott/workspace/mbel-work/tei2html/build/web/./L02634/L02634.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./A10720/A10720.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./L07894/L07894.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./L09828/L09828.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./A10636D/A10636D.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./L13943/L13943.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./A10879/A10879.html:
{},

/home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html:
{}
   }


 Original Message 
Subject: Re: Highlighting
From: Scott Derrick sc...@tnstaafl.net
To: solr-user@lucene.apache.org
Date: 08/12/2015 06:39 AM


I was pretty sure I tried that, though I thought if you don't specify it
just uses the search terms?

If I just search for calvin and don't specify a field, what do I
assign hl.fl?

Scott

On 8/11/2015 7:27 PM, Erik Hatcher wrote:

Scott - doesn’t look you’ve specified hl.fl specifying which field(s)
to highlight.

p.s. Erick Erickson surely likes your e-mail domain :)


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/





On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote:

I guess I really don't get Highlighting in Solr.

We are transitioning from Google Custom Search which generally sucks,
but does return nicely formatted highlighted fragment.

I

Re: Highlighting, all matches show empty {}

2015-08-12 Thread Erick Erickson
Well, the example you just showed shouldn't show any highlighting. Your query is
q=concord
so it's trying to highlight concord which isn't in any of your
documents. hl.q can be
used to highlight something other than your q parameter.

I did notice in some of your other examples that you seemed to be searching for
terms that were in the fields so I suspect this isn't really your root
problem though.

do note that fields _must_ be stored to have highlighting work. Is it
possible that your
matches are on fields that aren't stored?

Let's build it up slowly though, try searching on one term in one
field that you _know_
is stored and see if you get anything back. While the query with
hl.fl=* and fl=field1, field2,
should be fine, let's start as simply as possible and work up maybe?

Best,
Erick

On Wed, Aug 12, 2015 at 7:59 AM, Scott Derrick sc...@tnstaafl.net wrote:
 I think the highlighter is actually running, but I'm not getting the
 results??

 with this request

 http://localhost:8983/solr/mbepp/select?q=concordfl=accession%2C+title%2C+author%2C+datewt=jsonindent=truehl=truehl.fl=*


 I get this response

 {
   responseHeader:{
 status:0,
 QTime:3,
 params:{
   q:concord,
   hl:true,
   indent:true,
   fl:accession, title, author, date,
   hl.fl:*,
   wt:json}},
   response:{numFound:3,start:0,docs:[
   {
 date:1890-02-26,
 author:Mary Baker Eddy,
 accession:L13943,
 title:[Mary Baker Eddy to Joseph E. Adams,]},
   {
 date:1896-01-13,
 author:Mary Baker Eddy,
 accession:L03453,
 title:[Mary Baker Eddy to Ira O. Knapp,]},
   {
 date:1902-06-15,
 author:Mary Baker Eddy,
 accession:A10145,
 title:[Message of the Pastor Emeritus to The First Church of
 Christ, Scientist, Boston, Mass., June 15, 1902]}]
   },
   highlighting:{

 /home/scott/workspace/mbel-work/tei2html/build/web/L13943/L13943.html:{},

 /home/scott/workspace/mbel-work/tei2html/build/web/L03453/L03453.html:{},

 /home/scott/workspace/mbel-work/tei2html/build/web/A10145/A10145.html:{}}}

 When I ran the request.
 In the admin plubins/Stats I set Watch Changes before processing the
 request.  Highlighting showed 2 changes, the gapFragmenter and HTMLFormatter

 here are the reported changes

 org.apache.solr.highlight.GapFragmenter
 class: org.apache.solr.highlight.GapFragmenter
 version: 5.2.1
 description: GapFragmenter
 stats: requests: Was: 117, Now: 156, Delta: 39

 org.apache.solr.highlight.HtmlFormatter
 class: org.apache.solr.highlight.HtmlFormatter
 version:5.2.1
 description:HtmlFormatter
 stats: requests: Was: 117, Now: 156, Delta: 39

 Looks to me like there were 39 fragments or something processed, yet you can
 see above the highlights are empty {}???

 though all the the other libraries in the highlighter showed no changes.

 which are these...

 org.apache.solr.highlight.BreakIteratorBoundaryScanner
 org.apache.solr.highlight.HtmlEncoder
 org.apache.solr.highlight.RegexFragmenter
 org.apache.solr.highlight.ScoreOrderFragmentsBuilder
 org.apache.solr.highlight.SimpleBoundaryScanner
 org.apache.solr.highlight.SimpleFragListBuilder
 org.apache.solr.highlight.SingleFragListBuilder
 org.apache.solr.highlight.WeightedFragListBuilder


 Scott

  Original Message 
 Subject: Highlighting, all matches show empty {}
 From: Scott Derrick sc...@tnstaafl.net
 To: solr-user@lucene.apache.org
 Date: 08/12/2015 08:20 AM

 Tried submitting a filed for hl.fl still empty {}

 here are the query terms

 responseHeader: {
  status: 0,
  QTime: 8,
  params: {
q: mary or calvin,
hl: true,
hl.simple.post: /em,
indent: true,
fl: accession, title, author, date,
hl.fl: *,
wt: json,
hl.simple.pre: em,
_: 1439388969240
  }

 here is one of the responses, there were 135

 {
  date: 1886-07-06,
  author: Mary Baker Eddy,
  accession: L02634,
  title: [
Mary Baker Eddy to Josephine C. Woodbury, July 6, 1886
  ]
 },

 here is the highlight section listing the first 10 matches, still empty {}

 highlighting: {

 /home/scott/workspace/mbel-work/tei2html/build/web/./L02634/L02634.html:
 {},

 /home/scott/workspace/mbel-work/tei2html/build/web/./A10720/A10720.html:
 {},

 /home/scott/workspace/mbel-work/tei2html/build/web/./L07894/L07894.html:
 {},

 /home/scott/workspace/mbel-work/tei2html/build/web/./L09828/L09828.html:
 {},


 /home/scott/workspace/mbel-work/tei2html/build/web/./A10636D/A10636D.html:
 {},

 /home/scott/workspace/mbel-work/tei2html/build/web/./L13943/L13943.html:
 {},

 /home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html:
 {},


 /home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html:
 {},

 /home/scott/workspace/mbel-work/tei2html/build/web/./A10879

Re: Highlighting, all matches show empty {}

2015-08-12 Thread Scott Derrick

Erick,

that explains it. I figured I didn't understand how solr handled 
highlight fragments.


Most of my documents are just text. or as solr specifies that content 
_text_, which is not stored, by default.


You mention the I was searching for concord and that its not in any 
documents.  But the results below clearly show 3 hits


response:{numFound:3,start:0,docs:[

the problem is the hits are in _text_

Is there a problem with storing _text_  so I can get a highlight 
fragment when a hit is found there?


Scott

 Original Message 
Subject: Re: Highlighting, all matches show empty {}
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Date: 08/12/2015 09:27 AM


Well, the example you just showed shouldn't show any highlighting. Your query is
q=concord
so it's trying to highlight concord which isn't in any of your
documents. hl.q can be
used to highlight something other than your q parameter.

I did notice in some of your other examples that you seemed to be searching for
terms that were in the fields so I suspect this isn't really your root
problem though.

do note that fields _must_ be stored to have highlighting work. Is it
possible that your
matches are on fields that aren't stored?

Let's build it up slowly though, try searching on one term in one
field that you _know_
is stored and see if you get anything back. While the query with
hl.fl=* and fl=field1, field2,
should be fine, let's start as simply as possible and work up maybe?

Best,
Erick

On Wed, Aug 12, 2015 at 7:59 AM, Scott Derrick sc...@tnstaafl.net wrote:

I think the highlighter is actually running, but I'm not getting the
results??

with this request

http://localhost:8983/solr/mbepp/select?q=concordfl=accession%2C+title%2C+author%2C+datewt=jsonindent=truehl=truehl.fl=*


I get this response

{
   responseHeader:{
 status:0,
 QTime:3,
 params:{
   q:concord,
   hl:true,
   indent:true,
   fl:accession, title, author, date,
   hl.fl:*,
   wt:json}},
   response:{numFound:3,start:0,docs:[
   {
 date:1890-02-26,
 author:Mary Baker Eddy,
 accession:L13943,
 title:[Mary Baker Eddy to Joseph E. Adams,]},
   {
 date:1896-01-13,
 author:Mary Baker Eddy,
 accession:L03453,
 title:[Mary Baker Eddy to Ira O. Knapp,]},
   {
 date:1902-06-15,
 author:Mary Baker Eddy,
 accession:A10145,
 title:[Message of the Pastor Emeritus to The First Church of
Christ, Scientist, Boston, Mass., June 15, 1902]}]
   },
   highlighting:{

/home/scott/workspace/mbel-work/tei2html/build/web/L13943/L13943.html:{},

/home/scott/workspace/mbel-work/tei2html/build/web/L03453/L03453.html:{},

/home/scott/workspace/mbel-work/tei2html/build/web/A10145/A10145.html:{}}}

When I ran the request.
In the admin plubins/Stats I set Watch Changes before processing the
request.  Highlighting showed 2 changes, the gapFragmenter and HTMLFormatter

here are the reported changes

org.apache.solr.highlight.GapFragmenter
 class: org.apache.solr.highlight.GapFragmenter
 version: 5.2.1
 description: GapFragmenter
 stats: requests: Was: 117, Now: 156, Delta: 39

org.apache.solr.highlight.HtmlFormatter
 class: org.apache.solr.highlight.HtmlFormatter
 version:5.2.1
 description:HtmlFormatter
 stats: requests: Was: 117, Now: 156, Delta: 39

Looks to me like there were 39 fragments or something processed, yet you can
see above the highlights are empty {}???

though all the the other libraries in the highlighter showed no changes.

which are these...

 org.apache.solr.highlight.BreakIteratorBoundaryScanner
 org.apache.solr.highlight.HtmlEncoder
 org.apache.solr.highlight.RegexFragmenter
 org.apache.solr.highlight.ScoreOrderFragmentsBuilder
 org.apache.solr.highlight.SimpleBoundaryScanner
 org.apache.solr.highlight.SimpleFragListBuilder
 org.apache.solr.highlight.SingleFragListBuilder
 org.apache.solr.highlight.WeightedFragListBuilder


Scott

 Original Message 
Subject: Highlighting, all matches show empty {}
From: Scott Derrick sc...@tnstaafl.net
To: solr-user@lucene.apache.org
Date: 08/12/2015 08:20 AM


Tried submitting a filed for hl.fl still empty {}

here are the query terms

responseHeader: {
  status: 0,
  QTime: 8,
  params: {
q: mary or calvin,
hl: true,
hl.simple.post: /em,
indent: true,
fl: accession, title, author, date,
hl.fl: *,
wt: json,
hl.simple.pre: em,
_: 1439388969240
  }

here is one of the responses, there were 135

{
  date: 1886-07-06,
  author: Mary Baker Eddy,
  accession: L02634,
  title: [
Mary Baker Eddy to Josephine C. Woodbury, July 6, 1886
  ]
},

here is the highlight section listing the first 10 matches, still empty {}

highlighting: {

/home/scott/workspace

Re: Highlighting, all matches show empty {}

2015-08-12 Thread Erick Erickson
bq: You mention the I was searching for concord and that its not in
any documents.  But the results below clearly show 3 hits

Right, as you figured out I _really_ meant concord in any stored
fields you were including in the hl.fl parameter. That could have
been clearer.

bq: Is there a problem with storing _text_  so I can get a highlight
fragment when a hit is found there?

No, you can store the data in the _text_ field just fine, you'll have
to re-index after the change though. It's often more useful to a user
to see the highlights in specific fields though, so I wouldn't throw
the rest of the highlighting away.

You should probably see the FastVectorHighlighter though. If you don't
use FVH, highlighting re-analyzes the raw text to produce the snippets
which may be expensive for large text fields.

Best,
Erick


On Wed, Aug 12, 2015 at 8:46 AM, Scott Derrick sc...@tnstaafl.net wrote:
 Erick,

 that explains it. I figured I didn't understand how solr handled highlight
 fragments.

 Most of my documents are just text. or as solr specifies that content
 _text_, which is not stored, by default.

 You mention the I was searching for concord and that its not in any
 documents.  But the results below clearly show 3 hits

response:{numFound:3,start:0,docs:[

 the problem is the hits are in _text_

 Is there a problem with storing _text_  so I can get a highlight fragment
 when a hit is found there?

 Scott

  Original Message 
 Subject: Re: Highlighting, all matches show empty {}
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Date: 08/12/2015 09:27 AM

 Well, the example you just showed shouldn't show any highlighting. Your
 query is
 q=concord
 so it's trying to highlight concord which isn't in any of your
 documents. hl.q can be
 used to highlight something other than your q parameter.

 I did notice in some of your other examples that you seemed to be
 searching for
 terms that were in the fields so I suspect this isn't really your root
 problem though.

 do note that fields _must_ be stored to have highlighting work. Is it
 possible that your
 matches are on fields that aren't stored?

 Let's build it up slowly though, try searching on one term in one
 field that you _know_
 is stored and see if you get anything back. While the query with
 hl.fl=* and fl=field1, field2,
 should be fine, let's start as simply as possible and work up maybe?

 Best,
 Erick

 On Wed, Aug 12, 2015 at 7:59 AM, Scott Derrick sc...@tnstaafl.net wrote:

 I think the highlighter is actually running, but I'm not getting the
 results??

 with this request


 http://localhost:8983/solr/mbepp/select?q=concordfl=accession%2C+title%2C+author%2C+datewt=jsonindent=truehl=truehl.fl=*


 I get this response

 {
responseHeader:{
  status:0,
  QTime:3,
  params:{
q:concord,
hl:true,
indent:true,
fl:accession, title, author, date,
hl.fl:*,
wt:json}},
response:{numFound:3,start:0,docs:[
{
  date:1890-02-26,
  author:Mary Baker Eddy,
  accession:L13943,
  title:[Mary Baker Eddy to Joseph E. Adams,]},
{
  date:1896-01-13,
  author:Mary Baker Eddy,
  accession:L03453,
  title:[Mary Baker Eddy to Ira O. Knapp,]},
{
  date:1902-06-15,
  author:Mary Baker Eddy,
  accession:A10145,
  title:[Message of the Pastor Emeritus to The First Church of
 Christ, Scientist, Boston, Mass., June 15, 1902]}]
},
highlighting:{


 /home/scott/workspace/mbel-work/tei2html/build/web/L13943/L13943.html:{},


 /home/scott/workspace/mbel-work/tei2html/build/web/L03453/L03453.html:{},


 /home/scott/workspace/mbel-work/tei2html/build/web/A10145/A10145.html:{}}}

 When I ran the request.
 In the admin plubins/Stats I set Watch Changes before processing the
 request.  Highlighting showed 2 changes, the gapFragmenter and
 HTMLFormatter

 here are the reported changes

 org.apache.solr.highlight.GapFragmenter
  class: org.apache.solr.highlight.GapFragmenter
  version: 5.2.1
  description: GapFragmenter
  stats: requests: Was: 117, Now: 156, Delta: 39

 org.apache.solr.highlight.HtmlFormatter
  class: org.apache.solr.highlight.HtmlFormatter
  version:5.2.1
  description:HtmlFormatter
  stats: requests: Was: 117, Now: 156, Delta: 39

 Looks to me like there were 39 fragments or something processed, yet you
 can
 see above the highlights are empty {}???

 though all the the other libraries in the highlighter showed no changes.

 which are these...

  org.apache.solr.highlight.BreakIteratorBoundaryScanner
  org.apache.solr.highlight.HtmlEncoder
  org.apache.solr.highlight.RegexFragmenter
  org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  org.apache.solr.highlight.SimpleBoundaryScanner
  org.apache.solr.highlight.SimpleFragListBuilder

Re: Highlighting

2015-08-12 Thread Scott Derrick

yeah, I'm partial to it too! :-)

On 8/11/2015 7:29 PM, Erick Erickson wrote:

bq: Erick Erickson surely likes your e-mail domain :)

Yep, I envy that one!

On Tue, Aug 11, 2015 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

Scott - doesn’t look you’ve specified hl.fl specifying which field(s) to 
highlight.

p.s. Erick Erickson surely likes your e-mail domain :)


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/





On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote:

I guess I really don't get Highlighting in Solr.

We are transitioning from Google Custom Search which generally sucks, but does 
return nicely formatted highlighted fragment.

I turn highlighting on hl=true in the query and I get a highlighting section 
returned at the bottom of the page, each identified by the document file name 
with a empty {} .  It doesn't matter what I search for, plain text, a field, I 
get a list of documents followed by an empty brace?

highlighting: {
/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html:
 {},
/home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html:
 {},
/home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html:
 {},
/home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: {}
  }

I haven't made any changes to the default settings

   highlighting
  !-- Configure the standard fragmenter --
  !-- This could most likely be commented out in the default case --
  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter

  !-- A regular-expression-based fragmenter
   (for sentence extraction)
--
  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter

  !-- Configure the standard formatter --
  formatter name=html
 default=true
 class=solr.highlight.HtmlFormatter
lst name=defaults
  str name=hl.simple.pre![CDATA[em]]/str
  str name=hl.simple.post![CDATA[/em]]/str
/lst
  /formatter

  !-- Configure the standard encoder --
  encoder name=html
   class=solr.highlight.HtmlEncoder /

  !-- Configure the standard fragListBuilder --
  fragListBuilder name=simple
   class=solr.highlight.SimpleFragListBuilder/

  !-- Configure the single fragListBuilder --
  fragListBuilder name=single
   class=solr.highlight.SingleFragListBuilder/

  !-- Configure the weighted fragListBuilder --
  fragListBuilder name=weighted
   default=true
   class=solr.highlight.WeightedFragListBuilder/

  !-- default tag FragmentsBuilder --
  fragmentsBuilder name=default
default=true
class=solr.highlight.ScoreOrderFragmentsBuilder
!--
lst name=defaults
  str name=hl.multiValuedSeparatorChar//str
/lst
--
  /fragmentsBuilder

  !-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
style=background:deepskyblue]]/str
  str name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

  boundaryScanner name=default
   default=true
   class=solr.highlight.SimpleBoundaryScanner
lst name=defaults
  str name=hl.bs.maxScan10/str
  str name=hl.bs.chars.,!? #9;#10;#13;/str
/lst
  /boundaryScanner

Re: Highlighting

2015-08-12 Thread Scott Derrick
I was pretty sure I tried that, though I thought if you don't specify it 
just uses the search terms?


If I just search for calvin and don't specify a field, what do I 
assign hl.fl?


Scott

On 8/11/2015 7:27 PM, Erik Hatcher wrote:

Scott - doesn’t look you’ve specified hl.fl specifying which field(s) to 
highlight.

p.s. Erick Erickson surely likes your e-mail domain :)


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/





On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote:

I guess I really don't get Highlighting in Solr.

We are transitioning from Google Custom Search which generally sucks, but does 
return nicely formatted highlighted fragment.

I turn highlighting on hl=true in the query and I get a highlighting section 
returned at the bottom of the page, each identified by the document file name 
with a empty {} .  It doesn't matter what I search for, plain text, a field, I 
get a list of documents followed by an empty brace?

highlighting: {
/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: {},
/home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html:
 {},
/home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html:
 {},
/home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html:
 {},
/home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: {}
  }

I haven't made any changes to the default settings

   highlighting
  !-- Configure the standard fragmenter --
  !-- This could most likely be commented out in the default case --
  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter

  !-- A regular-expression-based fragmenter
   (for sentence extraction)
--
  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter

  !-- Configure the standard formatter --
  formatter name=html
 default=true
 class=solr.highlight.HtmlFormatter
lst name=defaults
  str name=hl.simple.pre![CDATA[em]]/str
  str name=hl.simple.post![CDATA[/em]]/str
/lst
  /formatter

  !-- Configure the standard encoder --
  encoder name=html
   class=solr.highlight.HtmlEncoder /

  !-- Configure the standard fragListBuilder --
  fragListBuilder name=simple
   class=solr.highlight.SimpleFragListBuilder/

  !-- Configure the single fragListBuilder --
  fragListBuilder name=single
   class=solr.highlight.SingleFragListBuilder/

  !-- Configure the weighted fragListBuilder --
  fragListBuilder name=weighted
   default=true
   class=solr.highlight.WeightedFragListBuilder/

  !-- default tag FragmentsBuilder --
  fragmentsBuilder name=default
default=true
class=solr.highlight.ScoreOrderFragmentsBuilder
!--
lst name=defaults
  str name=hl.multiValuedSeparatorChar//str
/lst
--
  /fragmentsBuilder

  !-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
style=background:deepskyblue]]/str
  str name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

  boundaryScanner name=default
   default=true
   class=solr.highlight.SimpleBoundaryScanner
lst name=defaults
  str name=hl.bs.maxScan10/str
  str name=hl.bs.chars.,!? #9;#10;#13;/str
/lst
  /boundaryScanner

Highlighting, all matches show empty {}

2015-08-12 Thread Scott Derrick

Tried submitting a filed for hl.fl still empty {}

here are the query terms

responseHeader: {
status: 0,
QTime: 8,
params: {
  q: mary or calvin,
  hl: true,
  hl.simple.post: /em,
  indent: true,
  fl: accession, title, author, date,
  hl.fl: *,
  wt: json,
  hl.simple.pre: em,
  _: 1439388969240
}

here is one of the responses, there were 135

{
date: 1886-07-06,
author: Mary Baker Eddy,
accession: L02634,
title: [
  Mary Baker Eddy to Josephine C. Woodbury, July 6, 1886
]
},

here is the highlight section listing the first 10 matches, still empty {}

highlighting: {

/home/scott/workspace/mbel-work/tei2html/build/web/./L02634/L02634.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10720/A10720.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./L07894/L07894.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./L09828/L09828.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10636D/A10636D.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./L13943/L13943.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10879/A10879.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: 
{}

  }


 Original Message 
Subject: Re: Highlighting
From: Scott Derrick sc...@tnstaafl.net
To: solr-user@lucene.apache.org
Date: 08/12/2015 06:39 AM


I was pretty sure I tried that, though I thought if you don't specify it
just uses the search terms?

If I just search for calvin and don't specify a field, what do I
assign hl.fl?

Scott

On 8/11/2015 7:27 PM, Erik Hatcher wrote:

Scott - doesn’t look you’ve specified hl.fl specifying which field(s)
to highlight.

p.s. Erick Erickson surely likes your e-mail domain :)


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/





On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote:

I guess I really don't get Highlighting in Solr.

We are transitioning from Google Custom Search which generally sucks,
but does return nicely formatted highlighted fragment.

I turn highlighting on hl=true in the query and I get a highlighting
section returned at the bottom of the page, each identified by the
document file name with a empty {} .  It doesn't matter what I search
for, plain text, a field, I get a list of documents followed by an
empty brace?

highlighting: {
/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html:
{},
/home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html:
{}
  }

I haven't made any changes to the default settings

   highlighting
  !-- Configure the standard fragmenter --
  !-- This could most likely be commented out in the default
case --
  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter

  !-- A regular-expression-based fragmenter
   (for sentence extraction)
--
  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop
--
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w
,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter

  !-- Configure the standard formatter --
  formatter name=html
 default=true
 class=solr.highlight.HtmlFormatter
lst name=defaults
  str name=hl.simple.pre![CDATA[em]]/str
  str name=hl.simple.post![CDATA[/em]]/str
/lst
  /formatter

  !-- Configure the standard encoder --
  encoder name=html
   class=solr.highlight.HtmlEncoder /

  !-- Configure the standard fragListBuilder --
  fragListBuilder name=simple
   class

Re: Highlighting

2015-08-11 Thread Erik Hatcher
Scott - doesn’t look you’ve specified hl.fl specifying which field(s) to 
highlight.

p.s. Erick Erickson surely likes your e-mail domain :)


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote:
 
 I guess I really don't get Highlighting in Solr.
 
 We are transitioning from Google Custom Search which generally sucks, but 
 does return nicely formatted highlighted fragment.
 
 I turn highlighting on hl=true in the query and I get a highlighting section 
 returned at the bottom of the page, each identified by the document file name 
 with a empty {} .  It doesn't matter what I search for, plain text, a field, 
 I get a list of documents followed by an empty brace?
 
 highlighting: {
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: 
 {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html:
  {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html:
  {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html:
  {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: {}
  }
 
 I haven't made any changes to the default settings
 
   highlighting
  !-- Configure the standard fragmenter --
  !-- This could most likely be commented out in the default case --
  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter
 
  !-- A regular-expression-based fragmenter
   (for sentence extraction)
--
  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter
 
  !-- Configure the standard formatter --
  formatter name=html
 default=true
 class=solr.highlight.HtmlFormatter
lst name=defaults
  str name=hl.simple.pre![CDATA[em]]/str
  str name=hl.simple.post![CDATA[/em]]/str
/lst
  /formatter
 
  !-- Configure the standard encoder --
  encoder name=html
   class=solr.highlight.HtmlEncoder /
 
  !-- Configure the standard fragListBuilder --
  fragListBuilder name=simple
   class=solr.highlight.SimpleFragListBuilder/
 
  !-- Configure the single fragListBuilder --
  fragListBuilder name=single
   class=solr.highlight.SingleFragListBuilder/
 
  !-- Configure the weighted fragListBuilder --
  fragListBuilder name=weighted
   default=true
   class=solr.highlight.WeightedFragListBuilder/
 
  !-- default tag FragmentsBuilder --
  fragmentsBuilder name=default
default=true
class=solr.highlight.ScoreOrderFragmentsBuilder
!--
lst name=defaults
  str name=hl.multiValuedSeparatorChar//str
/lst
--
  /fragmentsBuilder
 
  !-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b 
 style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
 style=background:deepskyblue]]/str
  str name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder
 
  boundaryScanner name=default
   default=true
   class=solr.highlight.SimpleBoundaryScanner
lst name=defaults
  str name=hl.bs.maxScan10/str
  str name=hl.bs.chars.,!? #9;#10;#13;/str
/lst
  /boundaryScanner
 
  boundaryScanner name=breakIterator
   class=solr.highlight.BreakIteratorBoundaryScanner
lst name=defaults
  !-- type should be one of CHARACTER, WORD(default

Highlighting

2015-08-11 Thread Scott Derrick

I guess I really don't get Highlighting in Solr.

We are transitioning from Google Custom Search which generally sucks, 
but does return nicely formatted highlighted fragment.


I turn highlighting on hl=true in the query and I get a highlighting 
section returned at the bottom of the page, each identified by the 
document file name with a empty {} .  It doesn't matter what I search 
for, plain text, a field, I get a list of documents followed by an empty 
brace?


highlighting: {

/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html: 
{},


/home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: 
{}

  }

I haven't made any changes to the default settings

   highlighting
  !-- Configure the standard fragmenter --
  !-- This could most likely be commented out in the default 
case --

  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter

  !-- A regular-expression-based fragmenter
   (for sentence extraction)
--
  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w 
,/\n\quot;apos;]{20,200}/str

/lst
  /fragmenter

  !-- Configure the standard formatter --
  formatter name=html
 default=true
 class=solr.highlight.HtmlFormatter
lst name=defaults
  str name=hl.simple.pre![CDATA[em]]/str
  str name=hl.simple.post![CDATA[/em]]/str
/lst
  /formatter

  !-- Configure the standard encoder --
  encoder name=html
   class=solr.highlight.HtmlEncoder /

  !-- Configure the standard fragListBuilder --
  fragListBuilder name=simple
   class=solr.highlight.SimpleFragListBuilder/

  !-- Configure the single fragListBuilder --
  fragListBuilder name=single
   class=solr.highlight.SingleFragListBuilder/

  !-- Configure the weighted fragListBuilder --
  fragListBuilder name=weighted
   default=true
   class=solr.highlight.WeightedFragListBuilder/

  !-- default tag FragmentsBuilder --
  fragmentsBuilder name=default
default=true
class=solr.highlight.ScoreOrderFragmentsBuilder
!--
lst name=defaults
  str name=hl.multiValuedSeparatorChar//str
/lst
--
  /fragmentsBuilder

  !-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre![CDATA[
   b style=background:yellow,b 
style=background:lawgreen,
   b style=background:aquamarine,b 
style=background:magenta,
   b style=background:palegreen,b 
style=background:coral,

   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
style=background:deepskyblue]]/str

  str name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

  boundaryScanner name=default
   default=true
   class=solr.highlight.SimpleBoundaryScanner
lst name=defaults
  str name=hl.bs.maxScan10/str
  str name=hl.bs.chars.,!? #9;#10;#13;/str
/lst
  /boundaryScanner

  boundaryScanner name=breakIterator
   class=solr.highlight.BreakIteratorBoundaryScanner
lst name=defaults
  !-- type should be one of CHARACTER, WORD(default), LINE and 
SENTENCE --

  str name=hl.bs.typeWORD/str
  !-- language and country are used when constructing Locale 
object.  --
  !-- And the Locale object will be used when getting instance 
of BreakIterator --

  str name=hl.bs.languageen/str
  str name=hl.bs.countryUS/str

Re: Highlighting

2015-08-11 Thread Erick Erickson
bq: Erick Erickson surely likes your e-mail domain :)

Yep, I envy that one!

On Tue, Aug 11, 2015 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 Scott - doesn’t look you’ve specified hl.fl specifying which field(s) to 
 highlight.

 p.s. Erick Erickson surely likes your e-mail domain :)


 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com http://www.lucidworks.com/




 On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote:

 I guess I really don't get Highlighting in Solr.

 We are transitioning from Google Custom Search which generally sucks, but 
 does return nicely formatted highlighted fragment.

 I turn highlighting on hl=true in the query and I get a highlighting section 
 returned at the bottom of the page, each identified by the document file 
 name with a empty {} .  It doesn't matter what I search for, plain text, a 
 field, I get a list of documents followed by an empty brace?

 highlighting: {
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: 
 {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: 
 {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: 
 {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: 
 {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: 
 {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: 
 {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html:
  {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html:
  {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html:
  {},
 /home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: {}
  }

 I haven't made any changes to the default settings

   highlighting
  !-- Configure the standard fragmenter --
  !-- This could most likely be commented out in the default case --
  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter

  !-- A regular-expression-based fragmenter
   (for sentence extraction)
--
  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter

  !-- Configure the standard formatter --
  formatter name=html
 default=true
 class=solr.highlight.HtmlFormatter
lst name=defaults
  str name=hl.simple.pre![CDATA[em]]/str
  str name=hl.simple.post![CDATA[/em]]/str
/lst
  /formatter

  !-- Configure the standard encoder --
  encoder name=html
   class=solr.highlight.HtmlEncoder /

  !-- Configure the standard fragListBuilder --
  fragListBuilder name=simple
   class=solr.highlight.SimpleFragListBuilder/

  !-- Configure the single fragListBuilder --
  fragListBuilder name=single
   class=solr.highlight.SingleFragListBuilder/

  !-- Configure the weighted fragListBuilder --
  fragListBuilder name=weighted
   default=true
   class=solr.highlight.WeightedFragListBuilder/

  !-- default tag FragmentsBuilder --
  fragmentsBuilder name=default
default=true
class=solr.highlight.ScoreOrderFragmentsBuilder
!--
lst name=defaults
  str name=hl.multiValuedSeparatorChar//str
/lst
--
  /fragmentsBuilder

  !-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b 
 style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
 style=background:deepskyblue]]/str
  str name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

  boundaryScanner name=default
   default=true
   class=solr.highlight.SimpleBoundaryScanner
lst name=defaults
  str name=hl.bs.maxScan10/str
  str name=hl.bs.chars.,!? #9;#10;#13;/str
/lst
  /boundaryScanner

  boundaryScanner name=breakIterator

Re: Solr 5.2.1 highlighting results are not available

2015-08-05 Thread Ahmet Arslan
Hi,

Your response says wt=json, but your solrconfig excerpt says wt=velocity.
May be you are hitting a different request handler?

What happens when you submit your query as q=Warszawadf=text_index




On Wednesday, August 5, 2015 8:28 AM, Michał Oleś michal.o...@gmail.com wrote:
I installed solr 5.2.1 and use dih example with tika integration to search
for pdf content. Everything work as expected except highlighting plugin.
When I execute the query I don't even see highlighting section in results:

{
  responseHeader: {
status: 0,
QTime: 1,
params: {
  indent: true,
  q: text_index:Warszawa,
  _: 1438704448534,
  hl.simple.pre: em,
  hl.simple.post: /em,
  hl.fl: text,
  wt: json,
  hl: true
}
  },
  response: {
  numFound: 2,
  start: 0,
  docs: [
  {
size: 698231,
lastModified: Tue Aug 04 07:38:07 UTC 2015,
id: C:\\Moje\\solr-5.2.1\\pdf\\D2015000105301.pdf,
text: [
  \n  \n \n\nDZIENNIK USTAW \nRZECZYPOSPOLITEJ POLSKIEJ
\n\nWarszawa, dnia 29 lipca 2015 r. \n\nPoz. 1053 \n\nRO ZPORZĄDZENIE
\n\nMINISTRA OBRONY NARODOWEJ \n\nz dnia 9 lipca 2015 r. \n\n
],
title: [
  Pozycja 1053 DPA.555.14.2015 JS (word)
],
author: jswiderska
  },
  {
size: 747618,
lastModified: Tue Aug 04 07:37:02 UTC 2015,
id: C:\\Moje\\solr-5.2.1\\pdf\\D2015000109301.pdf,
text: [
  \n  \n \n\nDZIENNIK USTAW \nRZECZYPOSPOLITEJ POLSKIEJ
\n\nWarszawa, dnia 3 sierpnia 2015 r. \n\n
],
title: [
  OGŁ - SZCZOTKA 1093
],
author: bzebrowska
  }
  ]
  }
}

My solrconfig.xml is default from that example. I tried to add default
values but it won't changed anything:

requestHandler name=/browse class=solr.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str

  !-- VelocityResponseWriter settings --
  str name=wtvelocity/str
  str name=v.templatebrowse/str
  str name=v.layoutlayout/str

  !-- Query settings --
  str name=defTypeedismax/str
  str name=q.alt*:*/str
  str name=rows10/str
  str name=fl*,score/str

  !-- Faceting defaults --
  str name=faceton/str
  str name=facet.mincount1/str

  !-- Highlighting defaults --
   str name=hlon/str
   str name=hl.fltext/str
   str name=hl.preserveMultitrue/str
  str name=hl.encoderhtml/str
   str name=hl.simple.prelt;bgt;/str
   str name=hl.simple.postlt;/bgt;/str
   str name=f.text.hl.snippets3/str
   str name=f.text.hl.fragsize200/str
   str name=f.text.hl.alternateFieldtext/str
   str name=f.text.hl.maxAlternateFieldLength750/str
/lst
  /requestHandler

Here is part of schema.xml:

field name=text type=text_general indexed=false stored=true
multiValued=true/
field name=text_index type=text_general indexed=true
stored=false multiValued=true/
copyField source=text dest=text_index/

As in example I use two fields (one for indexing and one for store value).
When I run debug I found that highlight plugin time = 0. So it looks like
this plugin isn't even got invoked. Also in solr admin panel under tab
Plugins/Stats for all org.apache.solr.highlight.* classes I got 0 request.


Re: Solr 5.2.1 highlighting results are not available

2015-08-05 Thread Michał Oleś
Thank you for answer. When I execute the query using q=Warszawadf=text_index
instead of q=text_index:Warszawa nothing changed. If I remove wt=json from
query I got response in xml but also without highlight results.


Re: Solr 5.2.1 highlighting results are not available

2015-08-05 Thread Ahmet Arslan
Hi,

bq: I don't even see highlighting section in results

I mean, it is possible that you are hitting a request/search handler that does 
not have highlighting component registered. This is possible when you 
explicitly register components (query, facet, highlighting etc). 

Lets first make sure it is in the components. When you add debug=true to your 
URL do you use some info about highlighting component?



On Wednesday, August 5, 2015 7:12 PM, Michał Oleś michal.o...@gmail.com wrote:
Thank you for answer. When I execute the query using q=Warszawadf=text_index
instead of q=text_index:Warszawa nothing changed. If I remove wt=json from
query I got response in xml but also without highlight results.


Re: Solr 5.2.1 highlighting results are not available

2015-08-05 Thread Michał Oleś
Hi,
I checked and for me config looks alright but if you can take a look it
will be great.

Here is whole solrconfig.xml:
http://pastebin.com/7YfVZA90

and here is full schema.xml:
http://pastebin.com/LgeAvtFf

and query result with enabled debug:
http://pastebin.com/i74Wyep3


Re: Solr 5.2.1 highlighting results are not available

2015-08-05 Thread Ahmet Arslan
Hi,

I couldn't find anything suspicious. It was allowed to highlight on an 
indexed=false field as long as a tokenizer defined on it: 
https://cwiki.apache.org/confluence/display/solr/Field+Properties+by+Use+Case

May be that is changed. Can you try to highlight on a both indexed and stored 
field?

Ahmet


On Wednesday, August 5, 2015 10:41 PM, Michał Oleś michal.o...@gmail.com 
wrote:
Hi,
I checked and for me config looks alright but if you can take a look it
will be great.

Here is whole solrconfig.xml:
http://pastebin.com/7YfVZA90

and here is full schema.xml:
http://pastebin.com/LgeAvtFf

and query result with enabled debug:
http://pastebin.com/i74Wyep3


Solr 5.2.1 highlighting results are not available

2015-08-04 Thread Michał Oleś
I installed solr 5.2.1 and use dih example with tika integration to search
for pdf content. Everything work as expected except highlighting plugin.
When I execute the query I don't even see highlighting section in results:

{
  responseHeader: {
status: 0,
QTime: 1,
params: {
  indent: true,
  q: text_index:Warszawa,
  _: 1438704448534,
  hl.simple.pre: em,
  hl.simple.post: /em,
  hl.fl: text,
  wt: json,
  hl: true
}
  },
  response: {
  numFound: 2,
  start: 0,
  docs: [
  {
size: 698231,
lastModified: Tue Aug 04 07:38:07 UTC 2015,
id: C:\\Moje\\solr-5.2.1\\pdf\\D2015000105301.pdf,
text: [
  \n  \n \n\nDZIENNIK USTAW \nRZECZYPOSPOLITEJ POLSKIEJ
\n\nWarszawa, dnia 29 lipca 2015 r. \n\nPoz. 1053 \n\nRO ZPORZĄDZENIE
\n\nMINISTRA OBRONY NARODOWEJ \n\nz dnia 9 lipca 2015 r. \n\n
],
title: [
  Pozycja 1053 DPA.555.14.2015 JS (word)
],
author: jswiderska
  },
  {
size: 747618,
lastModified: Tue Aug 04 07:37:02 UTC 2015,
id: C:\\Moje\\solr-5.2.1\\pdf\\D2015000109301.pdf,
text: [
  \n  \n \n\nDZIENNIK USTAW \nRZECZYPOSPOLITEJ POLSKIEJ
\n\nWarszawa, dnia 3 sierpnia 2015 r. \n\n
],
title: [
  OGŁ - SZCZOTKA 1093
],
author: bzebrowska
  }
  ]
  }
}

My solrconfig.xml is default from that example. I tried to add default
values but it won't changed anything:

 requestHandler name=/browse class=solr.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str

  !-- VelocityResponseWriter settings --
  str name=wtvelocity/str
  str name=v.templatebrowse/str
  str name=v.layoutlayout/str

  !-- Query settings --
  str name=defTypeedismax/str
  str name=q.alt*:*/str
  str name=rows10/str
  str name=fl*,score/str

  !-- Faceting defaults --
  str name=faceton/str
  str name=facet.mincount1/str

  !-- Highlighting defaults --
   str name=hlon/str
   str name=hl.fltext/str
   str name=hl.preserveMultitrue/str
  str name=hl.encoderhtml/str
   str name=hl.simple.prelt;bgt;/str
   str name=hl.simple.postlt;/bgt;/str
   str name=f.text.hl.snippets3/str
   str name=f.text.hl.fragsize200/str
   str name=f.text.hl.alternateFieldtext/str
   str name=f.text.hl.maxAlternateFieldLength750/str
/lst
  /requestHandler

Here is part of schema.xml:

field name=text type=text_general indexed=false stored=true
multiValued=true/
field name=text_index type=text_general indexed=true
stored=false multiValued=true/
copyField source=text dest=text_index/

As in example I use two fields (one for indexing and one for store value).
When I run debug I found that highlight plugin time = 0. So it looks like
this plugin isn't even got invoked. Also in solr admin panel under tab
Plugins/Stats for all org.apache.solr.highlight.* classes I got 0 request.


Problem with Highlighting results

2015-07-29 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.2.1, and sometimes, the highlighting return with results,
but there is no correct match in all the fields that are listed in hl.fl,
and there is also no em tag on the results at all.

What could be the reason that this is happening?

I've include my highlighting request handler here.

  requestHandler name=/highlight class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=wtjson/str
str name=indenttrue/str
str name=dftext/str
str name=flid, title, content_type, last_modified, url, score/str

str name=hlon/str
str name=hl.flid, title, content, author, tag/str
str name=hl.highlightMultiTermtrue/str
str name=hl.preserveMultitrue/str
str name=hl.encoderhtml/str
str name=hl.fragsize200/str
str name=hl.regex.slop0.6/str
 /lst
  /requestHandler

Regards,
Edwin


Re: Highlighting pre and post tags not working

2015-07-13 Thread Upayavira
You need to xml encode the tags. So instead of em, put lt;emgt;
and instead of /em put lt;/emgt;

Upayavira

On Mon, Jul 13, 2015, at 05:19 PM, Paden wrote:
 Hello,
 
 I'm trying to get some Solr highlighting going but I've run into a small
 problem. When I set the pre and post tags with my own custom tag I get an
 XML error
 
 XML Parsing Error: mismatched tag. Expected: /em.
 Location:
 file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml
 Line Number 476, Column 40:   str name=hl.simple.preem/str
 
 I've seen it done like this on a lot of the other sites and I'm not sure
 if
 I'm missing an escape character or something. Just to emphasize that I
 did
 set a POST tag I put it right after the pre in solrconfig.xml like so
 
 str name=hl.simple.preem/str
 str name=hl.simple.post/em/str 
 
 What am I doing wrong here? 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting pre and post tags not working

2015-07-13 Thread Erick Erickson
Try
str name=hl.simple.prelt;emgt;/str
or
str name=hl.simple.pre![CDATA[em]]/str

The bare  and  confuse the XML parsing.

Best
Erick

On Mon, Jul 13, 2015 at 9:19 AM, Paden rumsey...@gmail.com wrote:
 Hello,

 I'm trying to get some Solr highlighting going but I've run into a small
 problem. When I set the pre and post tags with my own custom tag I get an
 XML error

 XML Parsing Error: mismatched tag. Expected: /em.
 Location:
 file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml
 Line Number 476, Column 40:   str name=hl.simple.preem/str

 I've seen it done like this on a lot of the other sites and I'm not sure if
 I'm missing an escape character or something. Just to emphasize that I did
 set a POST tag I put it right after the pre in solrconfig.xml like so

 str name=hl.simple.preem/str
 str name=hl.simple.post/em/str

 What am I doing wrong here?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting pre and post tags not working

2015-07-13 Thread Erik Hatcher
Within XML, angle brackets must be escaped as lt; and gt;





 On Jul 13, 2015, at 12:19 PM, Paden rumsey...@gmail.com wrote:
 
 Hello,
 
 I'm trying to get some Solr highlighting going but I've run into a small
 problem. When I set the pre and post tags with my own custom tag I get an
 XML error
 
 XML Parsing Error: mismatched tag. Expected: /em.
 Location:
 file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml
 Line Number 476, Column 40:   str name=hl.simple.preem/str
 
 I've seen it done like this on a lot of the other sites and I'm not sure if
 I'm missing an escape character or something. Just to emphasize that I did
 set a POST tag I put it right after the pre in solrconfig.xml like so
 
 str name=hl.simple.preem/str
 str name=hl.simple.post/em/str 
 
 What am I doing wrong here? 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Highlighting pre and post tags not working

2015-07-13 Thread Paden
Hello,

I'm trying to get some Solr highlighting going but I've run into a small
problem. When I set the pre and post tags with my own custom tag I get an
XML error

XML Parsing Error: mismatched tag. Expected: /em.
Location:
file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml
Line Number 476, Column 40:   str name=hl.simple.preem/str

I've seen it done like this on a lot of the other sites and I'm not sure if
I'm missing an escape character or something. Just to emphasize that I did
set a POST tag I put it right after the pre in solrconfig.xml like so

str name=hl.simple.preem/str
str name=hl.simple.post/em/str 

What am I doing wrong here? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with distributed search using grouping and highlighting

2015-07-09 Thread Cario, Elaine
Rich,

I've run into various problems with group.query and highlighting.  You noted 
one below (SOLR-5046), and there is also SOLR-6712, which might be related to 
what you are experiencing.  Still waiting for that patch to be reviewed...

-Original Message-
From: Rich Hume [mailto:rh...@identifix.com] 
Sent: Monday, June 08, 2015 2:23 PM
To: solr-user@lucene.apache.org
Subject: Problem with distributed search using grouping and highlighting

I am currently using Solr 4.5.1.  In the hopes of seeing better query 
performance, I have sharded an index and I am trying to use the shards 
parameter along with grouping and highlighting.  I am not currently using Solr 
cloud.

I got past an earlier problem by adding a second sort parameter (as described 
in JIRA Solr-5046).  Unfortunately, I have found nothing related to my latest 
index out of bounds problem.  I do not believe that JIRA Solr-5709 is related 
since my unique keys are in fact unique across the shards.

If anyone can point out something that I am doing wrong it would be greatly 
appreciated.

Thanks,
Rich

I am seeing the following error, the parameters I am passing are below the 
stack trace.

null:java.lang.ArrayIndexOutOfBoundsException: 35
 at 
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:185)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)


Here are the parameters I am passing:

group=truegroup.offset=0group.limit=10group.field=DeDup
group.query=DocumentTypes:34group.query=DocumentTypes:35group.query=DocumentTypes:32
shards=localhost:8983/solr/IX1,localhost:8983/solr/IX2
fq=+DocumentTypes:(34 35 32)
defType=edismaxqf=csTitle^100 csContent q=any matching search string
start=0rows=10
fl=PageNumber,FilePath,DocumentGUID,ResultDisplayContent,DocumentTypes
sort=score desc,DocumentGUID asc
hl=on
hl.fl=csTitle,csContent




boolean for if highlighting snippet contains complete original value or is truncated

2015-07-07 Thread Philip Durbin
I've been playing around with highlighting snippets and there are
times when I'd like to know if the snippet returned contains the
entire value of the original field or if the snippet is a truncated
version of the original field.

For example, when I search for xms below I can tell that the snippet
returned is truncated:

- original: CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered
DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail
- snippet: CORSAIR  emXMS/em 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System

Obviously, I can compare the original to the snippet (stripping out
the em tags) to see if they are the same, but does Solr natively
support returning a boolean if the values are equal? I couldn't find
anything at https://wiki.apache.org/solr/HighlightingParameters

Maybe the boolean would say truncated=true or something.

Here's the example:

$ curl 
'http://localhost:8983/solr/collection1/select?wt=jsonindent=truehl=truehl.fl=*q=xms'
{
  responseHeader:{
status:0,
QTime:2,
params:{
  indent:true,
  q:xms,
  hl.fl:*,
  wt:json,
  hl:true}},
  response:{numFound:1,start:0,docs:[
  {
id:TWINX2048-3200PRO,
name:CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail,
manu:Corsair Microsystems Inc.,
manu_id_s:corsair,
cat:[electronics,
  memory],
features:[CAS latency 2,\t2-3-3-6 timing, 2.75v,
unbuffered, heat-spreader],
price:185.0,
price_c:185,USD,
popularity:5,
inStock:true,
store:37.7752,-122.4232,
manufacturedate_dt:2006-02-13T15:26:37Z,
payloads:electronics|6.0 memory|3.0,
_version_:1506070286991097856}]
  },
  highlighting:{
TWINX2048-3200PRO:{
  name:[CORSAIR  emXMS/em 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System]}}}

My use case is only including ellipses (...) when the snippet is
truncated: https://github.com/IQSS/dataverse/issues/537

Thanks,

Phil

-- 
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin


Re: issue with highlighting in solr 4.10.2

2015-06-29 Thread Dmitry Kan
Hi Erick,

The Contents field contains one sentence only and no watch exists in it.
Plus we use quite large snippet size to surely cover the field.

Dmitry

On Sat, Jun 27, 2015 at 6:16 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Does watch exist in the Contents field somewhere outside the snippet
 size you've specified?

 Shot in the dark,
 Erick

 On Fri, Jun 26, 2015 at 3:22 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Hi,
 
  When highlighting hits for the following query:
 
  (+Contents:apple +Contents:watch) Contents:iphone
 
  I expect the standard solr highlighter to highlight either iphone or
 iphone
  AND apple, only if watch is present.
 
  However, solr highlights iphone along with only apple. Is this a bug or a
  known feature? Is there any way to debug the highlighter using solr
 admin?
 
  --
  Dmitry Kan
  Luke Toolbox: http://github.com/DmitryKey/luke
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
  SemanticAnalyzer: www.semanticanalyzer.info




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: issue with highlighting in solr 4.10.2

2015-06-27 Thread Erick Erickson
Does watch exist in the Contents field somewhere outside the snippet
size you've specified?

Shot in the dark,
Erick

On Fri, Jun 26, 2015 at 3:22 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Hi,

 When highlighting hits for the following query:

 (+Contents:apple +Contents:watch) Contents:iphone

 I expect the standard solr highlighter to highlight either iphone or iphone
 AND apple, only if watch is present.

 However, solr highlights iphone along with only apple. Is this a bug or a
 known feature? Is there any way to debug the highlighter using solr admin?

 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info


issue with highlighting in solr 4.10.2

2015-06-26 Thread Dmitry Kan
Hi,

When highlighting hits for the following query:

(+Contents:apple +Contents:watch) Contents:iphone

I expect the standard solr highlighter to highlight either iphone or iphone
AND apple, only if watch is present.

However, solr highlights iphone along with only apple. Is this a bug or a
known feature? Is there any way to debug the highlighter using solr admin?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Inconsistent Solr highlighting

2015-06-15 Thread Becker Moritz
Hi,

I have the requirement to index internationalized fields ('name') with Solr.
For this purpose, I want to use dynamic fields and have e.g. 'name_en', 
'name_de', 'name_fr' in my Solr documents.

When querying the index, I need to know which language a match was found in. 
For this, I want to use Solr highlighting.

My problem is now, that the highlighting seems to work inconsistently which is 
a problem in my use case.
The field configuration for e.g. my dynamic field '*_en' field is as follows:

dynamicField name=*_en  type=text_enindexed=true  stored=true 
multiValued=false/

The field type 'text_en' is configured as follows:

fieldType name=text_en class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
!-- Optionally you may want to use this less aggressive 
stemmer instead of PorterStemFilterFactory:
filter class=solr.EnglishMinimalStemFilterFactory/
--
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
!-- Optionally you may want to use this less aggressive 
stemmer instead of PorterStemFilterFactory:
filter class=solr.EnglishMinimalStemFilterFactory/
--
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

My index contains the following document:

doc
int name=id25/int
str name=name_itNote Test/str
str name=description_it/
str name=name_enNote Test Translation/str
str name=description_en/
long name=_version_1504065955969368064/long
/doc

The query defType=edismaxq=Translationhl=onhl.fl=name_* returns the above 
document but does not highlight anything.
The query defType=edismaxq=name_en:Translationhl=onhl.fl=name_* returns the 
above document AND highlights 'Translation' as expected.
Since translation does occur in any other field, I do not understand how the 
match could have occurred on a different than 'name_en' (which would explain 
why 'name_en' is not highlighted).
I already tried:
http://stackoverflow.com/questions/23755097/solr-highlighting-hl-simple-pre-post-doesnt-appear-sometime
http://lucene.472066.n3.nabble.com/Urgent-Highlighting-not-working-as-expected-td3983755.html
http://stackoverflow.com/questions/9842886/why-is-this-simple-solr-highlighting-attempt-failing

Neither worked.

Moreover, when I run defType=edismaxq=Notehl=onhl.fl=name_* the result is
doc
int name=id25/int
str name=name_itNote Test/str
str name=description_it/
str name=name_enNote Test Translation/str
str name=description_en/
long name=_version_1504067222466723840/long
/doc
doc
int name=id27/int
str name=name_deNote Test child/str
str name=description_de/
long name=_version_1504067222528589824/long
/doc

However, the highlighting only contains fields of document 25 but not 27:

lst name=highlighting
lst name=25
arr name=name_it
strlt;emgt;Notelt;/emgt; Test/str
/arr
arr name=name_en
strlt;emgt;Notelt;/emgt; Test Translation/str
/arr
/lst
lstname=27/
/lst

I really do not understand what is happening here and what I can do to make the 
highlighting consistent.
Also, is my approach with the 'name_en', 'name_de', ... for localized field 
indexing reasonable or is there a much more preferable way?

Thank you for your help and best regards

Moritz Becker
Softwareentwicklung

curecomp Software Services GmbH
Hafenstrasse 47-51
4020 Linz

web: www.curecomp.comhttp://www.curecomp.com/
e-Mail: m.bec...@curecomp.commailto:m.bec...@curecomp.com

[Beschreibung: Beschreibung: premium SRM for premium customers]



Re: Show all fields in Solr highlighting output

2015-06-11 Thread Ahmet Arslan
Hi Edwin,

I think Highlighting Behaviour of those types shifts over time. May be we 
should do the reverse. 
Move snippets to main response: https://issues.apache.org/jira/browse/SOLR-3479

Ahmet



On Thursday, June 11, 2015 11:23 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com 
wrote:
Hi Ahmet,

I've tried that, but it's still not able to show.

Those fields are actually of type=float, type=date and type=int.

By default those field type are not able to be highlighted?

Regards,
Edwin




On 11 June 2015 at 15:03, Ahmet Arslan iori...@yahoo.com.invalid wrote:

 Hi Edwin,

 hl.alternateField is probably what you are looking for.

 ahmet




 On Thursday, June 11, 2015 5:38 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com wrote:
 Hi,

 Is it possible to list all the fields in the highlighting portion in the
 output?
 Currently,even when I str name=hl.fl*/str, it only shows fields where
 highlighting is possible, and fields which highlighting is not possible is
 not shown.

 I would like to have the output where all the fields, regardless if
 highlighting is possible or not, to be shown together.


 Regards,
 Edwin



Re: DocTransformers for restructuring output, e.g. Highlighting

2015-06-11 Thread Upayavira
Yes! It only needs to be done!

On Thu, Jun 11, 2015, at 11:38 AM, Ahmet Arslan wrote:
 Hi Upayavira,
 
 I was going to suggest SOLR-3479 to Edwin, I saw your old post.
 
 Regarding your suggestion, there is an existing ticket : 
 https://issues.apache.org/jira/browse/SOLR-3479
 
 I think SOLR-7665 is also relevant to your question.
 
 Ahmet
  
 
 
 
 On Sunday, June 23, 2013 9:54 PM, Upayavira u...@odoko.co.uk wrote:
 I've just taken a peek at the src for DocTransformers. They get given a
 TransformContext. That context contains the query and a few other bits
 and pieces.
 
 If it contained the response, DocTransformers would be able to do output
 restructuring. The best example is hit highlighting. If you did:
 
 hl=onhl.fl=namefl=*,[highlight:name]
 
 you would no longer need to seek the highlighted strings in another part
 of the output.
 
 The conceptual downside of this approach is that we might expect the
 highlighting to be done inside the DocTransfomer not a search component,
 i.e. not needing the hl=onhl.fl=name bit. That is, this would be a
 great change for existing Solr users, but might be confusing for new
 Solr users.
 
 I did try to move the highlighting code itself into the DocTransformer,
 but stalled at the point at which it needed to be CoreAware, as
 DocTransformers aren't allowed to be. Without that, it isn't possible to
 access the Highlighter components in the core's configuration.
 
 Thoughts? Is this a useful feature?
 
 Upayavira


Re: DocTransformers for restructuring output, e.g. Highlighting

2015-06-11 Thread Ahmet Arslan
Hi Upayavira,

I was going to suggest SOLR-3479 to Edwin, I saw your old post.

Regarding your suggestion, there is an existing ticket : 
https://issues.apache.org/jira/browse/SOLR-3479

I think SOLR-7665 is also relevant to your question.

Ahmet
 



On Sunday, June 23, 2013 9:54 PM, Upayavira u...@odoko.co.uk wrote:
I've just taken a peek at the src for DocTransformers. They get given a
TransformContext. That context contains the query and a few other bits
and pieces.

If it contained the response, DocTransformers would be able to do output
restructuring. The best example is hit highlighting. If you did:

hl=onhl.fl=namefl=*,[highlight:name]

you would no longer need to seek the highlighted strings in another part
of the output.

The conceptual downside of this approach is that we might expect the
highlighting to be done inside the DocTransfomer not a search component,
i.e. not needing the hl=onhl.fl=name bit. That is, this would be a
great change for existing Solr users, but might be confusing for new
Solr users.

I did try to move the highlighting code itself into the DocTransformer,
but stalled at the point at which it needed to be CoreAware, as
DocTransformers aren't allowed to be. Without that, it isn't possible to
access the Highlighter components in the core's configuration.

Thoughts? Is this a useful feature?

Upayavira


RE: Show all fields in Solr highlighting output

2015-06-11 Thread Reitzel, Charles
Moving the highlighted snippets to the main response is a bad thing for some 
applications.  E.g. if you do any sorting or searching on the returned fields, 
you need to use the original values.   The same is true if any of the values 
are used as a key into some other system or table lookup.   Specifically, the 
insertion of markup into the text changes values that affect sorting and 
matching.

Thus the wisdom of the current design that returns highlighting results 
separately.

Of course, it is very simple to merge the highlighting results into the 
returned documents.   The highlighting results have been thoughtfully arranged 
as a lookup table using the unique ID field as the key.   In SolrJ, this is a 
Map.   Thus, you can loop over the result documents, lookup the highlight 
results for that document and overwrite the original value with the highlighted 
value.   Be sure to set your snippet size bigger than the largest value you 
expect!

Anyway, this type of thing is better handled by the application than Solr, per 
se.

static int nDocs( QueryResponse response ) {
int nReturned = 0;
if ( null != response  null != response.getResults() ) {
nReturned = response.getResults().size();
}
return nReturned;
}

static boolean hasHighlight( QueryResponse response ) {
boolean hasHL = false;
if ( null != response  null != response.getHighlighting() ) {
hasHL = response.getHighlighting().size()  0;
}
return hasHL;
}

protected void mergeHighlightResults( QueryResponse response, String 
uniqueIdField )
{
if ( nDocs(response)  0  hasHighlight(response) )
{
for ( SolrDocument result : response.getResults() )
{
MapString, ListString hlDoc
 = response.getHighlighting().get( 
result.getFirstValue(uniqueIdField) );
if ( null != hlDoc  hlDoc.size()  0 ) {
for ( String fieldName : hlDoc.keySet() ) 
{
ListString hlValues = hlDoc.get( 
fieldName );
// This is the only tricky bit: this 
logic may not work all that well for multi-valued fields.
// You cannot reliably match the 
altered values to an original value.  So, if any HL values
// are returned, just replace all 
values with HL values.
// This will not work 100% of the time.

int ix = 0;
for ( String hlVal : hlValues ) {
if ( 0 == ix++ ) {
result.setField( 
fieldName, hlVal );
}
else {
result.addField( 
fieldName, hlVal );
}
}
}
}
}
}
}

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Thursday, June 11, 2015 6:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Show all fields in Solr highlighting output

Hi Edwin,

I think Highlighting Behaviour of those types shifts over time. May be we 
should do the reverse. 
Move snippets to main response: https://issues.apache.org/jira/browse/SOLR-3479

Ahmet



On Thursday, June 11, 2015 11:23 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com 
wrote:
Hi Ahmet,

I've tried that, but it's still not able to show.

Those fields are actually of type=float, type=date and type=int.

By default those field type are not able to be highlighted?

Regards,
Edwin




On 11 June 2015 at 15:03, Ahmet Arslan iori...@yahoo.com.invalid wrote:

 Hi Edwin,

 hl.alternateField is probably what you are looking for.

 ahmet




 On Thursday, June 11, 2015 5:38 AM, Zheng Lin Edwin Yeo  
 edwinye...@gmail.com wrote:
 Hi,

 Is it possible to list all the fields in the highlighting portion in 
 the output?
 Currently,even when I str name=hl.fl*/str, it only shows fields 
 where highlighting is possible, and fields which highlighting is not 
 possible is not shown.

 I would like to have the output where all the fields, regardless if 
 highlighting is possible or not, to be shown together.


 Regards,
 Edwin


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


Re: Show all fields in Solr highlighting output

2015-06-11 Thread Ahmet Arslan
Hi Edwin,

hl.alternateField is probably what you are looking for.

ahmet




On Thursday, June 11, 2015 5:38 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com 
wrote:
Hi,

Is it possible to list all the fields in the highlighting portion in the
output?
Currently,even when I str name=hl.fl*/str, it only shows fields where
highlighting is possible, and fields which highlighting is not possible is
not shown.

I would like to have the output where all the fields, regardless if
highlighting is possible or not, to be shown together.


Regards,
Edwin


Re: Show all fields in Solr highlighting output

2015-06-11 Thread Zheng Lin Edwin Yeo
Thank you for the info, Will try to implement it.

Regards,
Edwin

On 12 June 2015 at 01:32, Reitzel, Charles charles.reit...@tiaa-cref.org
wrote:

 Moving the highlighted snippets to the main response is a bad thing for
 some applications.  E.g. if you do any sorting or searching on the returned
 fields, you need to use the original values.   The same is true if any of
 the values are used as a key into some other system or table lookup.
  Specifically, the insertion of markup into the text changes values that
 affect sorting and matching.

 Thus the wisdom of the current design that returns highlighting results
 separately.

 Of course, it is very simple to merge the highlighting results into the
 returned documents.   The highlighting results have been thoughtfully
 arranged as a lookup table using the unique ID field as the key.   In
 SolrJ, this is a Map.   Thus, you can loop over the result documents,
 lookup the highlight results for that document and overwrite the original
 value with the highlighted value.   Be sure to set your snippet size bigger
 than the largest value you expect!

 Anyway, this type of thing is better handled by the application than Solr,
 per se.

 static int nDocs( QueryResponse response ) {
 int nReturned = 0;
 if ( null != response  null != response.getResults() ) {
 nReturned = response.getResults().size();
 }
 return nReturned;
 }

 static boolean hasHighlight( QueryResponse response ) {
 boolean hasHL = false;
 if ( null != response  null != response.getHighlighting() ) {
 hasHL = response.getHighlighting().size()  0;
 }
 return hasHL;
 }

 protected void mergeHighlightResults( QueryResponse response, String
 uniqueIdField )
 {
 if ( nDocs(response)  0  hasHighlight(response) )
 {
 for ( SolrDocument result : response.getResults() )
 {
 MapString, ListString hlDoc
  = response.getHighlighting().get(
 result.getFirstValue(uniqueIdField) );
 if ( null != hlDoc  hlDoc.size()  0 ) {
 for ( String fieldName : hlDoc.keySet() )
 {
 ListString hlValues = hlDoc.get(
 fieldName );
 // This is the only tricky bit:
 this logic may not work all that well for multi-valued fields.
 // You cannot reliably match the
 altered values to an original value.  So, if any HL values
 // are returned, just replace all
 values with HL values.
 // This will not work 100% of the
 time.

 int ix = 0;
 for ( String hlVal : hlValues ) {
 if ( 0 == ix++ ) {
 result.setField(
 fieldName, hlVal );
 }
 else {
 result.addField(
 fieldName, hlVal );
 }
 }
 }
 }
 }
 }
 }

 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
 Sent: Thursday, June 11, 2015 6:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Show all fields in Solr highlighting output

 Hi Edwin,

 I think Highlighting Behaviour of those types shifts over time. May be we
 should do the reverse.
 Move snippets to main response:
 https://issues.apache.org/jira/browse/SOLR-3479

 Ahmet



 On Thursday, June 11, 2015 11:23 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com wrote:
 Hi Ahmet,

 I've tried that, but it's still not able to show.

 Those fields are actually of type=float, type=date and type=int.

 By default those field type are not able to be highlighted?

 Regards,
 Edwin




 On 11 June 2015 at 15:03, Ahmet Arslan iori...@yahoo.com.invalid wrote:

  Hi Edwin,
 
  hl.alternateField is probably what you are looking for.
 
  ahmet
 
 
 
 
  On Thursday, June 11, 2015 5:38 AM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com wrote:
  Hi,
 
  Is it possible to list all the fields in the highlighting portion in
  the output?
  Currently,even when I str name=hl.fl*/str, it only shows fields
  where highlighting is possible, and fields which highlighting is not
  possible is not shown.
 
  I would like to have the output where all the fields, regardless if
  highlighting is possible or not, to be shown together.
 
 
  Regards,
  Edwin
 

 *
 This e-mail may contain

Re: Show all fields in Solr highlighting output

2015-06-11 Thread Zheng Lin Edwin Yeo
Hi Ahmet,

I've tried that, but it's still not able to show.

Those fields are actually of type=float, type=date and type=int.

By default those field type are not able to be highlighted?

Regards,
Edwin



On 11 June 2015 at 15:03, Ahmet Arslan iori...@yahoo.com.invalid wrote:

 Hi Edwin,

 hl.alternateField is probably what you are looking for.

 ahmet




 On Thursday, June 11, 2015 5:38 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com wrote:
 Hi,

 Is it possible to list all the fields in the highlighting portion in the
 output?
 Currently,even when I str name=hl.fl*/str, it only shows fields where
 highlighting is possible, and fields which highlighting is not possible is
 not shown.

 I would like to have the output where all the fields, regardless if
 highlighting is possible or not, to be shown together.


 Regards,
 Edwin



Show all fields in Solr highlighting output

2015-06-10 Thread Zheng Lin Edwin Yeo
Hi,

Is it possible to list all the fields in the highlighting portion in the
output?
Currently,even when I str name=hl.fl*/str, it only shows fields where
highlighting is possible, and fields which highlighting is not possible is
not shown.

I would like to have the output where all the fields, regardless if
highlighting is possible or not, to be shown together.


Regards,
Edwin


Problem with distributed search using grouping and highlighting

2015-06-08 Thread Rich Hume
I am currently using Solr 4.5.1.  In the hopes of seeing better query 
performance, I have sharded an index and I am trying to use the shards 
parameter along with grouping and highlighting.  I am not currently using Solr 
cloud.

I got past an earlier problem by adding a second sort parameter (as described 
in JIRA Solr-5046).  Unfortunately, I have found nothing related to my latest 
index out of bounds problem.  I do not believe that JIRA Solr-5709 is related 
since my unique keys are in fact unique across the shards.

If anyone can point out something that I am doing wrong it would be greatly 
appreciated.

Thanks,
Rich

I am seeing the following error, the parameters I am passing are below the 
stack trace.

null:java.lang.ArrayIndexOutOfBoundsException: 35
 at 
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:185)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)


Here are the parameters I am passing:

group=truegroup.offset=0group.limit=10group.field=DeDup
group.query=DocumentTypes:34group.query=DocumentTypes:35group.query=DocumentTypes:32
shards=localhost:8983/solr/IX1,localhost:8983/solr/IX2
fq=+DocumentTypes:(34 35 32)
defType=edismaxqf=csTitle^100 csContent
q=any matching search string
start=0rows=10
fl=PageNumber,FilePath,DocumentGUID,ResultDisplayContent,DocumentTypes
sort=score desc,DocumentGUID asc
hl=on
hl.fl=csTitle,csContent




Re: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-11 Thread William Bell
Has anyone looked at it?

On Sun, May 3, 2015 at 10:18 AM, jaime spicciati jaime.spicci...@gmail.com
wrote:

 We ran into this as well on 4.10.3 (not related to an upgrade). It was
 identified during load testing when a small percentage of queries would
 take more than 20 seconds to return. We were able to isolate it by
 rerunning the same query multiple times and regardless of cache hits the
 queries would still take a long time to return. We used this method to
 narrow down the performance problem to a small number of very large records
 (many many fields in a single record).

 We fixed it by turning on hl.requireFieldMatch on the query so that only
 fields that have an actual hit are passed through the highlighter.

 Hopefully this helps,
 Jaime Spicciati

 On Sat, May 2, 2015 at 8:20 PM, Joel Bernstein joels...@gmail.com wrote:

  Hi,
 
  Can you also include the details of your research that narrowed the issue
  to the highlighter?
 
  Joel Bernstein
  http://joelsolr.blogspot.com/
 
  On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) 
  michael.r...@lexisnexis.com wrote:
 
   Are you able to identify if there is a particular part of the code that
  is
   slow?
  
   A simple way to do this is to use the jstack command (assuming your
  server
   has the full JDK installed). You can run it like this:
   /path/to/java/bin/jstack PID
  
   If you run that a bunch of times while your highlight query is running,
   you might be able to spot the hotspot. Usually I'll do something like
  this
   to see the stacktrace for the thread running the query:
   /path/to/java/bin/jstack PID | grep SearchHandler -B30
  
   A few more questions:
   - What are response times you are seeing before and after the upgrade?
 Is
   unusably slow 1 second, 10 seconds...?
   - If you run the exact same query multiple times, is it consistently
  slow?
   Or is it only slow on the first run?
   - While the query is running, do you see high user CPU on your server,
 or
   high IO wait, or both? (You can check this with the top command or
 vmstat
   command in Linux.)
  
   -Michael
  
   -Original Message-
   From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
   Sent: Saturday, May 02, 2015 4:13 PM
   To: solr-user@lucene.apache.org
   Subject: Upgraded to 4.10.3, highlighting performance unusably slow
  
   Hello,
  
   We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this
 upgrade
   caused a incredible slowdown in our searches. We were able to narrow it
   down to the highlighting. The slowdown is extreme enough that we are
   holding back our release until we can resolve this.  Our research
  indicated
   using TermVectors  FastHighlighter were the way to go, however this
  still
   does nothing for the performance. I think we may be overlooking a
 crucial
   configuration, but cannot figure it out. I was hoping for some guidance
  and
   help. Sorry for the long email, I wanted to provide enough information.
  
   Our documents are largely dynamic fields, and so we have been using ‘*’
  as
   the field for highlighting. This is the same setting as in prior
 versions
   of solr use. The dynamic fields are of type ’text’ and we added
   customizations to the schema.xml for the type ’text’:
  
   fieldType name=text class=solr.TextField
 positionIncrementGap=100
   storeOffsetsWithPositions=true termVectors=true
 termPositions=true
   termOffsets=true
 analyzer type=index
   !--  this charFilter removes all xml-tagging from the text: --
   charFilter class=solr.HTMLStripCharFilterFactory/
   tokenizer class=solr.WhitespaceTokenizerFactory/
   !-- Case insensitive stop word removal.
 add enablePositionIncrements=true in both the index and query
 analyzers to leave a 'gap' for more accurate phrase queries.
   --
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
   generateNumberParts=1 catenateWords=1 catenateNumbers=1
   catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory language=English
   protected=protwords.txt/
 /analyzer
 analyzer type=query
   !--  this charFilter removes all xml-tagging from the text. Needed
   also in query due to autosuggest --
   charFilter class=solr.HTMLStripCharFilterFactory/
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
   generateNumberParts=1 catenateWords=0 catenateNumbers=0
   catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory language=English
   protected=protwords.txt/
 /analyzer
   /fieldType

Re: Slow highlighting on Solr 5.0.0

2015-05-11 Thread Ere Maijala
Thanks for the pointers. Using hl.usePhraseHighlighter=false does indeed 
make it a lot faster. Obviously it's not really a solution, though, 
since in 4.10 it wasn't a problem and turning it off has consequences. 
I'm looking forward for the improvements in the next releases.


--Ere

8.5.2015, 19.06, Matt Hilt kirjoitti:

I¹ve been looking into this again. The phrase highlighter is much slower
than the default highlighter, so you might be able to add
hl.usePhraseHighlighter=false to your query to make it faster. Note that
web interface will NOT help here, because that param is true by default,
and the checkbox is basically broken in that respect. Also, the default
highlighter doesn¹t seem to work in all case the phrase highlighter does
though.

Also, the current development branch of 5x is much better than 5.1, but
not as good as 4.10. This ticket seems to be hitting on some of the issues
at hand:
https://issues.apache.org/jira/browse/SOLR-5855


I think this means they are getting there, but the performance is really
still much worse than 4.10, and its not obvious why.


On 5/5/15, 2:06 AM, Ere Maijala ere.maij...@helsinki.fi wrote:


I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here
are my timings:

4.10.2:
process: 1432.0
highlight: 723.0

5.1.0:
process: 9570.0
highlight: 8790.0

schema.xml and solrconfig.xml are available at
https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf
.

A couple of jstack outputs taken when the query was executing are
available at http://pastebin.com/eJrEy2Wb

Any suggestions would be appreciated. Or would it make sense to just
file a JIRA issue?

--Ere

3.3.2015, 0.48, Matt Hilt kirjoitti:

Short form:
While testing Solr 5.0.0 within our staging environment, I noticed that
highlight enabled queries are much slower than I saw with 4.10. Are
there any obvious reasons why this might be the case? As far as I can
tell, nothing has changed with the default highlight search component or
its parameters.


A little more detail:
The bulk of the collection config set was stolen from the basic 4.X
example config set. I changed my schema.xml and solrconfig.xml just
enough to get 5.0 to create a new collection (removed non-trie fields,
some other deprecated response handler definitions, etc). I can provide
my version of the solr.HighlightComponent config, but it is identical to
the sample_techproducts_configs example in 5.0.  Are there any other
config files I could provide that might be useful?


Number on ³much slower²:
I indexed a very small subset of my data into the new collection and
used the /select interface to do a simple debug query. Solr 4.10 gives
the following pertinent info:
response: { numFound: 72628,
...
debug: {
timing: { time: 95, process: { time: 94, query: { time: 6 },
highlight: { time: 84 }, debug: { time: 4 } }
---
Whereas solr 5.0 is:
response: { numFound: 1093,
...
debug: {
timing: { time: 6551, process: { time: 6549, query: { time:
0 }, highlight: { time: 6524 }, debug: { time: 25 }






--
Ere Maijala
Kansalliskirjasto / The National Library of Finland



--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Slow highlighting on Solr 5.0.0

2015-05-08 Thread Matt Hilt
I¹ve been looking into this again. The phrase highlighter is much slower
than the default highlighter, so you might be able to add
hl.usePhraseHighlighter=false to your query to make it faster. Note that
web interface will NOT help here, because that param is true by default,
and the checkbox is basically broken in that respect. Also, the default
highlighter doesn¹t seem to work in all case the phrase highlighter does
though. 

Also, the current development branch of 5x is much better than 5.1, but
not as good as 4.10. This ticket seems to be hitting on some of the issues
at hand:
https://issues.apache.org/jira/browse/SOLR-5855


I think this means they are getting there, but the performance is really
still much worse than 4.10, and its not obvious why.


On 5/5/15, 2:06 AM, Ere Maijala ere.maij...@helsinki.fi wrote:

I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here
are my timings:

4.10.2:
process: 1432.0
highlight: 723.0

5.1.0:
process: 9570.0
highlight: 8790.0

schema.xml and solrconfig.xml are available at
https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf
.

A couple of jstack outputs taken when the query was executing are
available at http://pastebin.com/eJrEy2Wb

Any suggestions would be appreciated. Or would it make sense to just
file a JIRA issue?

--Ere

3.3.2015, 0.48, Matt Hilt kirjoitti:
 Short form:
 While testing Solr 5.0.0 within our staging environment, I noticed that
 highlight enabled queries are much slower than I saw with 4.10. Are
 there any obvious reasons why this might be the case? As far as I can
 tell, nothing has changed with the default highlight search component or
 its parameters.


 A little more detail:
 The bulk of the collection config set was stolen from the basic 4.X
 example config set. I changed my schema.xml and solrconfig.xml just
 enough to get 5.0 to create a new collection (removed non-trie fields,
 some other deprecated response handler definitions, etc). I can provide
 my version of the solr.HighlightComponent config, but it is identical to
 the sample_techproducts_configs example in 5.0.  Are there any other
 config files I could provide that might be useful?


 Number on ³much slower²:
 I indexed a very small subset of my data into the new collection and
 used the /select interface to do a simple debug query. Solr 4.10 gives
 the following pertinent info:
 response: { numFound: 72628,
 ...
 debug: {
 timing: { time: 95, process: { time: 94, query: { time: 6 },
 highlight: { time: 84 }, debug: { time: 4 } }
 ---
 Whereas solr 5.0 is:
 response: { numFound: 1093,
 ...
 debug: {
 timing: { time: 6551, process: { time: 6549, query: { time:
 0 }, highlight: { time: 6524 }, debug: { time: 25 }





-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland


smime.p7s
Description: S/MIME cryptographic signature


Re: Slow highlighting on Solr 5.0.0

2015-05-05 Thread Ere Maijala
I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here 
are my timings:


4.10.2:
process: 1432.0
highlight: 723.0

5.1.0:
process: 9570.0
highlight: 8790.0

schema.xml and solrconfig.xml are available at 
https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf.


A couple of jstack outputs taken when the query was executing are 
available at http://pastebin.com/eJrEy2Wb


Any suggestions would be appreciated. Or would it make sense to just 
file a JIRA issue?


--Ere

3.3.2015, 0.48, Matt Hilt kirjoitti:

Short form:
While testing Solr 5.0.0 within our staging environment, I noticed that
highlight enabled queries are much slower than I saw with 4.10. Are
there any obvious reasons why this might be the case? As far as I can
tell, nothing has changed with the default highlight search component or
its parameters.


A little more detail:
The bulk of the collection config set was stolen from the basic 4.X
example config set. I changed my schema.xml and solrconfig.xml just
enough to get 5.0 to create a new collection (removed non-trie fields,
some other deprecated response handler definitions, etc). I can provide
my version of the solr.HighlightComponent config, but it is identical to
the sample_techproducts_configs example in 5.0.  Are there any other
config files I could provide that might be useful?


Number on “much slower”:
I indexed a very small subset of my data into the new collection and
used the /select interface to do a simple debug query. Solr 4.10 gives
the following pertinent info:
response: { numFound: 72628,
...
debug: {
timing: { time: 95, process: { time: 94, query: { time: 6 },
highlight: { time: 84 }, debug: { time: 4 } }
---
Whereas solr 5.0 is:
response: { numFound: 1093,
...
debug: {
timing: { time: 6551, process: { time: 6549, query: { time:
0 }, highlight: { time: 6524 }, debug: { time: 25 }






--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-03 Thread jaime spicciati
We ran into this as well on 4.10.3 (not related to an upgrade). It was
identified during load testing when a small percentage of queries would
take more than 20 seconds to return. We were able to isolate it by
rerunning the same query multiple times and regardless of cache hits the
queries would still take a long time to return. We used this method to
narrow down the performance problem to a small number of very large records
(many many fields in a single record).

We fixed it by turning on hl.requireFieldMatch on the query so that only
fields that have an actual hit are passed through the highlighter.

Hopefully this helps,
Jaime Spicciati

On Sat, May 2, 2015 at 8:20 PM, Joel Bernstein joels...@gmail.com wrote:

 Hi,

 Can you also include the details of your research that narrowed the issue
 to the highlighter?

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) 
 michael.r...@lexisnexis.com wrote:

  Are you able to identify if there is a particular part of the code that
 is
  slow?
 
  A simple way to do this is to use the jstack command (assuming your
 server
  has the full JDK installed). You can run it like this:
  /path/to/java/bin/jstack PID
 
  If you run that a bunch of times while your highlight query is running,
  you might be able to spot the hotspot. Usually I'll do something like
 this
  to see the stacktrace for the thread running the query:
  /path/to/java/bin/jstack PID | grep SearchHandler -B30
 
  A few more questions:
  - What are response times you are seeing before and after the upgrade? Is
  unusably slow 1 second, 10 seconds...?
  - If you run the exact same query multiple times, is it consistently
 slow?
  Or is it only slow on the first run?
  - While the query is running, do you see high user CPU on your server, or
  high IO wait, or both? (You can check this with the top command or vmstat
  command in Linux.)
 
  -Michael
 
  -Original Message-
  From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
  Sent: Saturday, May 02, 2015 4:13 PM
  To: solr-user@lucene.apache.org
  Subject: Upgraded to 4.10.3, highlighting performance unusably slow
 
  Hello,
 
  We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade
  caused a incredible slowdown in our searches. We were able to narrow it
  down to the highlighting. The slowdown is extreme enough that we are
  holding back our release until we can resolve this.  Our research
 indicated
  using TermVectors  FastHighlighter were the way to go, however this
 still
  does nothing for the performance. I think we may be overlooking a crucial
  configuration, but cannot figure it out. I was hoping for some guidance
 and
  help. Sorry for the long email, I wanted to provide enough information.
 
  Our documents are largely dynamic fields, and so we have been using ‘*’
 as
  the field for highlighting. This is the same setting as in prior versions
  of solr use. The dynamic fields are of type ’text’ and we added
  customizations to the schema.xml for the type ’text’:
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
  storeOffsetsWithPositions=true termVectors=true termPositions=true
  termOffsets=true
analyzer type=index
  !--  this charFilter removes all xml-tagging from the text: --
  charFilter class=solr.HTMLStripCharFilterFactory/
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
  --
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
/analyzer
analyzer type=query
  !--  this charFilter removes all xml-tagging from the text. Needed
  also in query due to autosuggest --
  charFilter class=solr.HTMLStripCharFilterFactory/
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0 catenateNumbers=0
  catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
/analyzer
  /fieldType
 
  One of the two dynamic fields we use:
 
  dynamicField name=DTPropValue_*  type=textindexed=true
  stored=true required=false multiValued=true/
 
  In our solrConfig.xml file, we have:
 
  requestHandler name=/eiHandler class

Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-02 Thread Cheng, Sophia Kuen
Hello,

We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade 
caused a incredible slowdown in our searches. We were able to narrow it down to 
the highlighting. The slowdown is extreme enough that we are holding back our 
release until we can resolve this.  Our research indicated using TermVectors  
FastHighlighter were the way to go, however this still does nothing for the 
performance. I think we may be overlooking a crucial configuration, but cannot 
figure it out. I was hoping for some guidance and help. Sorry for the long 
email, I wanted to provide enough information.

Our documents are largely dynamic fields, and so we have been using ‘*’ as the 
field for highlighting. This is the same setting as in prior versions of solr 
use. The dynamic fields are of type ’text’ and we added customizations to the 
schema.xml for the type ’text’:

fieldType name=text class=solr.TextField positionIncrementGap=100 
storeOffsetsWithPositions=true termVectors=true termPositions=true 
termOffsets=true
  analyzer type=index
!--  this charFilter removes all xml-tagging from the text: --
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
  analyzer type=query
!--  this charFilter removes all xml-tagging from the text. Needed also in 
query due to autosuggest --
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
/fieldType

One of the two dynamic fields we use:

dynamicField name=DTPropValue_*  type=textindexed=true  
stored=true required=false multiValued=true/

In our solrConfig.xml file, we have:

requestHandler name=/eiHandler class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
 int name=rows13/int
 bool name=tvtrue/bool
 bool name=hl.useFastVectorHighligtertrue/bool
   /lst
arr name=last-components
strtvComponent/str
/arr
/requestHandler
searchComponent name=tvComponent class=solr.TermVectorComponent”/
searchComponent class=solr.HighlightComponent name=highlight
  highlighting
fragmenter name=gap default=true class=solr.highlight.GapFragmenter
  lst name=defaults
int name=hl.fragsize100/int
  /lst
/fragmenter
fragmenter name=regex class=solr.highlight.RegexFragmenter
  lst name=defaults
int name=hl.fragsize70/int
float name=hl.regex.slop0.5/float
str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
  /lst
/fragmenter

formatter name=html default=true class=solr.highlight.HtmlFormatter
  lst name=defaults
str name=hl.simple.pre![CDATA[i]]/str
str name=hl.simple.post![CDATA[/i]]/str
  /lst
/formatter

encoder name=html class=solr.highlight.HtmlEncoder /
fragListBuilder name=simple 
class=solr.highlight.SimpleFragListBuilder/
fragListBuilder name=single 
class=solr.highlight.SingleFragListBuilder/
fragListBuilder name=weighted default=true 
class=solr.highlight.WeightedFragListBuilder/
fragmentsBuilder name=default default=true 
class=solr.highlight.ScoreOrderFragmentsBuilder
/fragmentsBuilder

!-- multi-colored tag FragmentsBuilder --
fragmentsBuilder name=colored 
class=solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b 
style=background:deepskyblue]]/str
str name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder

boundaryScanner name=default default=true 
class=solr.highlight.SimpleBoundaryScanner
  lst name=defaults
str name=hl.bs.maxScan10/str
str name=hl.bs.chars.,!? #9;#10;#13;/str
  /lst

RE: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-02 Thread Ryan, Michael F. (LNG-DAY)
Are you able to identify if there is a particular part of the code that is slow?

A simple way to do this is to use the jstack command (assuming your server has 
the full JDK installed). You can run it like this:
/path/to/java/bin/jstack PID

If you run that a bunch of times while your highlight query is running, you 
might be able to spot the hotspot. Usually I'll do something like this to see 
the stacktrace for the thread running the query:
/path/to/java/bin/jstack PID | grep SearchHandler -B30

A few more questions:
- What are response times you are seeing before and after the upgrade? Is 
unusably slow 1 second, 10 seconds...?
- If you run the exact same query multiple times, is it consistently slow? Or 
is it only slow on the first run?
- While the query is running, do you see high user CPU on your server, or high 
IO wait, or both? (You can check this with the top command or vmstat command in 
Linux.)

-Michael

-Original Message-
From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu] 
Sent: Saturday, May 02, 2015 4:13 PM
To: solr-user@lucene.apache.org
Subject: Upgraded to 4.10.3, highlighting performance unusably slow

Hello,

We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade 
caused a incredible slowdown in our searches. We were able to narrow it down to 
the highlighting. The slowdown is extreme enough that we are holding back our 
release until we can resolve this.  Our research indicated using TermVectors  
FastHighlighter were the way to go, however this still does nothing for the 
performance. I think we may be overlooking a crucial configuration, but cannot 
figure it out. I was hoping for some guidance and help. Sorry for the long 
email, I wanted to provide enough information.

Our documents are largely dynamic fields, and so we have been using ‘*’ as the 
field for highlighting. This is the same setting as in prior versions of solr 
use. The dynamic fields are of type ’text’ and we added customizations to the 
schema.xml for the type ’text’:

fieldType name=text class=solr.TextField positionIncrementGap=100 
storeOffsetsWithPositions=true termVectors=true termPositions=true 
termOffsets=true
  analyzer type=index
!--  this charFilter removes all xml-tagging from the text: --
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
  analyzer type=query
!--  this charFilter removes all xml-tagging from the text. Needed also in 
query due to autosuggest --
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
/fieldType

One of the two dynamic fields we use:

dynamicField name=DTPropValue_*  type=textindexed=true  
stored=true required=false multiValued=true/

In our solrConfig.xml file, we have:

requestHandler name=/eiHandler class=solr.SearchHandler lst 
name=defaults str name=echoParamsexplicit/str
 int name=rows13/int
 bool name=tvtrue/bool
 bool name=hl.useFastVectorHighligtertrue/bool
   /lst
arr name=last-components
strtvComponent/str
/arr
/requestHandler
searchComponent name=tvComponent class=solr.TermVectorComponent”/
searchComponent class=solr.HighlightComponent name=highlight
  highlighting
fragmenter name=gap default=true class=solr.highlight.GapFragmenter
  lst name=defaults
int name=hl.fragsize100/int
  /lst
/fragmenter
fragmenter name=regex class=solr.highlight.RegexFragmenter
  lst name=defaults
int name=hl.fragsize70/int
float name=hl.regex.slop0.5/float
str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
  /lst
/fragmenter

formatter name=html default=true class=solr.highlight.HtmlFormatter
  lst name=defaults
str name=hl.simple.pre![CDATA[i]]/str
str name=hl.simple.post![CDATA[/i]]/str
  /lst
/formatter

encoder name=html class=solr.highlight.HtmlEncoder /
fragListBuilder name=simple

Re: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-02 Thread Joel Bernstein
Hi,

Can you also include the details of your research that narrowed the issue
to the highlighter?

Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) 
michael.r...@lexisnexis.com wrote:

 Are you able to identify if there is a particular part of the code that is
 slow?

 A simple way to do this is to use the jstack command (assuming your server
 has the full JDK installed). You can run it like this:
 /path/to/java/bin/jstack PID

 If you run that a bunch of times while your highlight query is running,
 you might be able to spot the hotspot. Usually I'll do something like this
 to see the stacktrace for the thread running the query:
 /path/to/java/bin/jstack PID | grep SearchHandler -B30

 A few more questions:
 - What are response times you are seeing before and after the upgrade? Is
 unusably slow 1 second, 10 seconds...?
 - If you run the exact same query multiple times, is it consistently slow?
 Or is it only slow on the first run?
 - While the query is running, do you see high user CPU on your server, or
 high IO wait, or both? (You can check this with the top command or vmstat
 command in Linux.)

 -Michael

 -Original Message-
 From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
 Sent: Saturday, May 02, 2015 4:13 PM
 To: solr-user@lucene.apache.org
 Subject: Upgraded to 4.10.3, highlighting performance unusably slow

 Hello,

 We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade
 caused a incredible slowdown in our searches. We were able to narrow it
 down to the highlighting. The slowdown is extreme enough that we are
 holding back our release until we can resolve this.  Our research indicated
 using TermVectors  FastHighlighter were the way to go, however this still
 does nothing for the performance. I think we may be overlooking a crucial
 configuration, but cannot figure it out. I was hoping for some guidance and
 help. Sorry for the long email, I wanted to provide enough information.

 Our documents are largely dynamic fields, and so we have been using ‘*’ as
 the field for highlighting. This is the same setting as in prior versions
 of solr use. The dynamic fields are of type ’text’ and we added
 customizations to the schema.xml for the type ’text’:

 fieldType name=text class=solr.TextField positionIncrementGap=100
 storeOffsetsWithPositions=true termVectors=true termPositions=true
 termOffsets=true
   analyzer type=index
 !--  this charFilter removes all xml-tagging from the text: --
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
   /analyzer
   analyzer type=query
 !--  this charFilter removes all xml-tagging from the text. Needed
 also in query due to autosuggest --
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
   /analyzer
 /fieldType

 One of the two dynamic fields we use:

 dynamicField name=DTPropValue_*  type=textindexed=true
 stored=true required=false multiValued=true/

 In our solrConfig.xml file, we have:

 requestHandler name=/eiHandler class=solr.SearchHandler lst
 name=defaults str name=echoParamsexplicit/str
  int name=rows13/int
  bool name=tvtrue/bool
  bool name=hl.useFastVectorHighligtertrue/bool
/lst
 arr name=last-components
 strtvComponent/str
 /arr
 /requestHandler
 searchComponent name=tvComponent class=solr.TermVectorComponent”/
 searchComponent class=solr.HighlightComponent name=highlight
   highlighting
 fragmenter name=gap default=true
 class=solr.highlight.GapFragmenter
   lst name=defaults
 int name=hl.fragsize100/int
   /lst
 /fragmenter
 fragmenter name=regex class=solr.highlight.RegexFragmenter
   lst name=defaults
 int name=hl.fragsize70/int
 float name=hl.regex.slop0.5/float
 str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
   /lst
 /fragmenter

Solr Highlighting

2015-04-28 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

When I perform a query, the matching document related field information is
displayed separate from the highlighting information. Is there a way to
merge these two so that highlighting for each document appears within the
document level information itself. That way, it would be easier to find
highlights for a particular document.

Otherwise, is there a better way to join these two to get a consolidated
view or is this to be handled custom-built? I am using SolrJ. Please let me
know whats the best way to handle this.


Thanks  Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Highlighting in Solr

2015-04-26 Thread Zheng Lin Edwin Yeo
I supposed currently the only way to show the highlighting snippets in xml
and json output is via a separate section at the bottom, and it is
currently not possible to show the highlighted snippets together with the
rest of the response?

Regards,
Edwin


On 22 April 2015 at 21:57, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Hi,

 I'm currently implementing highlighting on my Solr-5.0.0. When I issue the
 following command:
 http://localhost:8983/solr/collection1/select?q=conducted
 http://localhost:8983/solr/edmtechnical/select?q=conducted
 hl=truehl.fl=Content,Summarywt=jsonindent=truerows=10,
 the highlighting result is listed at the bottom of the output, instead of
 together with the rest of the response above. The result is shown below:

   response:{numFound:10,start:0,docs:[
   {
 id:1-1,
 Summary:i} Trial conducted,
 Content:Completed,
 _version_:1498407036159787020},


   highlighting:{
 1-1:{
   Summary:[i) Trial emconducted/em]}


 Is there any way to get the highlighted output to be displayed together with 
 the rest of the response, instead of having it display separately at the 
 bottom? Which is something like this


   response:{numFound:10,start:0,docs:[
   {
 id:1-1,
 Summary:i} Trial emconducted/em,
 Content:Completed,
 _version_:1498407036159787020},


 Regards,
 Edwin




Highlighting in Solr

2015-04-22 Thread Zheng Lin Edwin Yeo
Hi,

I'm currently implementing highlighting on my Solr-5.0.0. When I issue the
following command:
http://localhost:8983/solr/collection1/select?q=conducted
http://localhost:8983/solr/edmtechnical/select?q=conducted
hl=truehl.fl=Content,Summarywt=jsonindent=truerows=10,
the highlighting result is listed at the bottom of the output, instead of
together with the rest of the response above. The result is shown below:

  response:{numFound:10,start:0,docs:[
  {
id:1-1,
Summary:i} Trial conducted,
Content:Completed,
_version_:1498407036159787020},


  highlighting:{
1-1:{
  Summary:[i) Trial emconducted/em]}


Is there any way to get the highlighted output to be displayed
together with the rest of the response, instead of having it display
separately at the bottom? Which is something like this


  response:{numFound:10,start:0,docs:[
  {
id:1-1,
Summary:i} Trial emconducted/em,
Content:Completed,
_version_:1498407036159787020},


Regards,
Edwin


Highlighting

2015-04-17 Thread Misagh Karimi


Hello All,
I am new to solr and trying to configure highlighting. If I look at the 
result in xml, or json format, I can see the highlighting part of the 
data and it looks good. However the velocity page does not show the 
highlighted words on my result page. Do I need to do something extra for 
the highlighting results to show up on the page that is generated by 
Velocity?


 Here is my hl setting in solrconfig.xml:
str name=hlon/str
str name=hl.flseriesTitle/str
str name=f.name.hl.fragsize0/str
str name=f.name.hl.alternateFieldseriesTitle/str

Here is those fields in schema.xml:
field name=seriesTitle type=text indexed=true stored=true/

fieldType name=text class=solr.TextField positionIncrementGap=100 
autoGeneratePhraseQueries=true

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
  /
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/

filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/

filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

Thank you in advance.
--
Misagh Karimi


Re: Retrieving list of words for highlighting

2015-03-27 Thread simon
There's a JIRA ( https://issues.apache.org/jira/browse/SOLR-4722 )
 describing a highlighter which returns term positions rather than
snippets, which could then be mapped to  the matching words in the indexed
document (assuming that it's stored or that you have a copy elsewhere).

-Simon

On Wed, Mar 25, 2015 at 7:30 PM, Damien Dykman damien.dyk...@gmail.com
wrote:

 In Solr 5 (or 4), is there an easy way to retrieve the list of words to
 highlight?

 Use case: allow an external application to highlight the matching words
 of a matching document, rather than using the highlighted snippets
 returned by Solr.

 Thanks,
 Damien



Retrieving list of words for highlighting

2015-03-25 Thread Damien Dykman
In Solr 5 (or 4), is there an easy way to retrieve the list of words to
highlight?

Use case: allow an external application to highlight the matching words
of a matching document, rather than using the highlighted snippets
returned by Solr.

Thanks,
Damien


<    1   2   3   4   5   6   7   8   9   10   >