Re: Solr's suggester results

2019-06-19 Thread ppunet
Here is my problem statement and I would really appreciate for your feedback.

1. There are 1000's of pdf's with large amount of content are indexed to
Solr.
2. Using AnalyzingInfixSuggester for the suggestions.

Q. As the SuggeterComponent provides the 'entire content' of the field in
the suggestions. How is it possible to have Suggester to return only part of
the content of the field, instead of the entire content, which in my
scenario quite long?


Thanks in advance.

PD



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr's suggester results

2015-06-17 Thread Zheng Lin Edwin Yeo
I'm using the FreeTextLookupFactory in my implementation now.

Yes, now it can suggest part of the field from the middle of the content.

I read that this implementation is able to consider the previous tokens
when making the suggestions. However, when I try to enter a search phrase,
it seems that it is only considering the last token and not any of the
previous tokens.

For example, when I search for
http://localhost:8983/edm/collection1/suggest?suggest.q=trouble free, it is
giving me suggestions based on the word 'free' only, and not 'trouble free'.

This is my configuration:

In solrconfig.xml:


  

FreeTextLookupFactory
suggester_freetext_dir
DocumentDictionaryFactory
Suggestion
suggestType
5
false
false
  



  
json
true

true
10
mySuggester
  
  
suggest
  


In schema.xml










Is there anything I configured wrongly? I've set the ngrams to 5, which
means it is supposed to consider up to the previous 5 tokens entered?


Regards,
Edwin


On 17 June 2015 at 22:12, Alessandro Benedetti 
wrote:

> Edwin,
> The spellcheck is a thing, the Suggester is another.
>
> If you need to provide auto suggestion to your users, the suggester is the
> right thing to use.
> But I really doubt to be useful to select as a suggester field the entire
> content.
> it is going to be quite expensive.
>
> In the case I would again really suggest you to take a look to the article
> I quoted and Solr generic documentation.
>
> It is possible to suggest part of the field.
> You can use the FreeText suggester with a proper analysis selected.
>
> Cheers
>
> 2015-06-17 6:14 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > Yes I've looked at that before, but I was told that the newer version of
> > Solr has its own suggester, and does not need to use spellchecker
> anymore?
> >
> > So it's not necessary to use the spellechecker inside suggester anymore?
> >
> > Regards,
> > Edwin
> >
> >
> > On 17 June 2015 at 11:56, Erick Erickson 
> wrote:
> >
> > > Have you looked at spellchecker? Because that sound much more like
> > > what you're asking about than suggester.
> > >
> > > Spell checking is more what you're asking for, have you even looked at
> > that
> > > after it was suggested?
> > >
> > > bq: Also, when I do a search, it shouldn't be returning whole fields,
> > > but just to return a portion of the sentence
> > >
> > > This is what highlighting is built for.
> > >
> > > Really, I recommend you take the time to do some familiarization with
> the
> > > whole search space and Solr. The excellent book here:
> > >
> > >
> > >
> >
> http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8&qid=1434513284&sr=8-1&keywords=apache+solr&pebp=1434513287267&perid=0YRK508J0HJ1N3BAX20E
> > >
> > > will give you the grounding you need to get the most out of Solr.
> > >
> > > Best,
> > > Erick
> > >
> > > On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
> > >  wrote:
> > > > The long content is from when I tried to index PDF files. As some PDF
> > > files
> > > > has alot of words in the content, it will lead to the *UTF8 encoding
> is
> > > > longer than the max length 32766 error.*
> > > >
> > > > I think the problem is the content size of the PDF file exceed 32766
> > > > characters?
> > > >
> > > > I'm trying to accomplish to be able to index documents that can be of
> > any
> > > > size (even those with very large contents), and build the suggester
> > from
> > > > there. Also, when I do a search, it shouldn't be returning whole
> > fields,
> > > > but just to return a portion of the sentence.
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 16 June 2015 at 23:02, Erick Erickson 
> > > wrote:
> > > >
> > > >> The suggesters are built to return whole fields. You _might_
> > > >> be able to add multiple fragments to a multiValued
> > > >> entry and get fragments, I haven't tried that though
> > > >> and I suspect that actually you'd get the same thing..
> > > >>
> > > >> This is an XY problem IMO. Please describe exactly what
> > > >> you're trying to accomplish, with examples rather than
> > > >> continue to pursue this path. It sounds like you want
> > > >> spellcheck or similar. The _point_ behind the
> > > >> suggesters is that they handle multiple-word suggestions
> > > >> by returning he whole field. So putting long text fields
> > > >> into them is not going to work.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
> > > >>  wrote:
> > > >> > in line :
> > > >> >
> > > >> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >:
> > > >> >
> > > >> >> Thanks Benedetti,
> > > >> >>
> > > >> >> I've change to the AnalyzingInfixLookup approach, and it is able
> to
> > > >> start
> > > >> >> searching from the middle of the field.
> > > >> >>
> > > >> >> However, is it possible to make the suggester to show only part
> of
> > > the
> > > >> >> content of the 

Re: Solr's suggester results

2015-06-17 Thread Alessandro Benedetti
Edwin,
The spellcheck is a thing, the Suggester is another.

If you need to provide auto suggestion to your users, the suggester is the
right thing to use.
But I really doubt to be useful to select as a suggester field the entire
content.
it is going to be quite expensive.

In the case I would again really suggest you to take a look to the article
I quoted and Solr generic documentation.

It is possible to suggest part of the field.
You can use the FreeText suggester with a proper analysis selected.

Cheers

2015-06-17 6:14 GMT+01:00 Zheng Lin Edwin Yeo :

> Yes I've looked at that before, but I was told that the newer version of
> Solr has its own suggester, and does not need to use spellchecker anymore?
>
> So it's not necessary to use the spellechecker inside suggester anymore?
>
> Regards,
> Edwin
>
>
> On 17 June 2015 at 11:56, Erick Erickson  wrote:
>
> > Have you looked at spellchecker? Because that sound much more like
> > what you're asking about than suggester.
> >
> > Spell checking is more what you're asking for, have you even looked at
> that
> > after it was suggested?
> >
> > bq: Also, when I do a search, it shouldn't be returning whole fields,
> > but just to return a portion of the sentence
> >
> > This is what highlighting is built for.
> >
> > Really, I recommend you take the time to do some familiarization with the
> > whole search space and Solr. The excellent book here:
> >
> >
> >
> http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8&qid=1434513284&sr=8-1&keywords=apache+solr&pebp=1434513287267&perid=0YRK508J0HJ1N3BAX20E
> >
> > will give you the grounding you need to get the most out of Solr.
> >
> > Best,
> > Erick
> >
> > On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
> >  wrote:
> > > The long content is from when I tried to index PDF files. As some PDF
> > files
> > > has alot of words in the content, it will lead to the *UTF8 encoding is
> > > longer than the max length 32766 error.*
> > >
> > > I think the problem is the content size of the PDF file exceed 32766
> > > characters?
> > >
> > > I'm trying to accomplish to be able to index documents that can be of
> any
> > > size (even those with very large contents), and build the suggester
> from
> > > there. Also, when I do a search, it shouldn't be returning whole
> fields,
> > > but just to return a portion of the sentence.
> > >
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 16 June 2015 at 23:02, Erick Erickson 
> > wrote:
> > >
> > >> The suggesters are built to return whole fields. You _might_
> > >> be able to add multiple fragments to a multiValued
> > >> entry and get fragments, I haven't tried that though
> > >> and I suspect that actually you'd get the same thing..
> > >>
> > >> This is an XY problem IMO. Please describe exactly what
> > >> you're trying to accomplish, with examples rather than
> > >> continue to pursue this path. It sounds like you want
> > >> spellcheck or similar. The _point_ behind the
> > >> suggesters is that they handle multiple-word suggestions
> > >> by returning he whole field. So putting long text fields
> > >> into them is not going to work.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
> > >>  wrote:
> > >> > in line :
> > >> >
> > >> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo  >:
> > >> >
> > >> >> Thanks Benedetti,
> > >> >>
> > >> >> I've change to the AnalyzingInfixLookup approach, and it is able to
> > >> start
> > >> >> searching from the middle of the field.
> > >> >>
> > >> >> However, is it possible to make the suggester to show only part of
> > the
> > >> >> content of the field (like 2 or 3 fields after), instead of the
> > entire
> > >> >> content/sentence, which can be quite long?
> > >> >>
> > >> >
> > >> > I assume you use "fields" in the place of tokens.
> > >> > The answer is yes, I already said that in my previous mail, I invite
> > you
> > >> to
> > >> > read carefully the answers and the documentation linked !
> > >> >
> > >> > Related the excessive dimensions of tokens. This is weird, what are
> > you
> > >> > trying to autocomplete ?
> > >> > I really doubt would be useful for a user to see super long auto
> > >> completed
> > >> > terms.
> > >> >
> > >> > Cheers
> > >> >
> > >> >>
> > >> >>
> > >> >> Regards,
> > >> >> Edwin
> > >> >>
> > >> >>
> > >> >>
> > >> >> On 15 June 2015 at 17:33, Alessandro Benedetti <
> > >> benedetti.ale...@gmail.com
> > >> >> >
> > >> >> wrote:
> > >> >>
> > >> >> > ehehe Edwin, I think you should read again the document I linked
> > time
> > >> >> ago :
> > >> >> >
> > >> >> > http://lucidworks.com/blog/solr-suggester/
> > >> >> >
> > >> >> > The suggester you used is not meant to provide infix suggestions.
> > >> >> > The fuzzy suggester is working on a fuzzy basis , with the
> > *starting*
> > >> >> terms
> > >> >> > of a field content.
> > >> >> >
> > >> >> > What you are looking for is actually one of the Infix Suggesters.

Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
Yes I've looked at that before, but I was told that the newer version of
Solr has its own suggester, and does not need to use spellchecker anymore?

So it's not necessary to use the spellechecker inside suggester anymore?

Regards,
Edwin


On 17 June 2015 at 11:56, Erick Erickson  wrote:

> Have you looked at spellchecker? Because that sound much more like
> what you're asking about than suggester.
>
> Spell checking is more what you're asking for, have you even looked at that
> after it was suggested?
>
> bq: Also, when I do a search, it shouldn't be returning whole fields,
> but just to return a portion of the sentence
>
> This is what highlighting is built for.
>
> Really, I recommend you take the time to do some familiarization with the
> whole search space and Solr. The excellent book here:
>
>
> http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8&qid=1434513284&sr=8-1&keywords=apache+solr&pebp=1434513287267&perid=0YRK508J0HJ1N3BAX20E
>
> will give you the grounding you need to get the most out of Solr.
>
> Best,
> Erick
>
> On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
>  wrote:
> > The long content is from when I tried to index PDF files. As some PDF
> files
> > has alot of words in the content, it will lead to the *UTF8 encoding is
> > longer than the max length 32766 error.*
> >
> > I think the problem is the content size of the PDF file exceed 32766
> > characters?
> >
> > I'm trying to accomplish to be able to index documents that can be of any
> > size (even those with very large contents), and build the suggester from
> > there. Also, when I do a search, it shouldn't be returning whole fields,
> > but just to return a portion of the sentence.
> >
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 16 June 2015 at 23:02, Erick Erickson 
> wrote:
> >
> >> The suggesters are built to return whole fields. You _might_
> >> be able to add multiple fragments to a multiValued
> >> entry and get fragments, I haven't tried that though
> >> and I suspect that actually you'd get the same thing..
> >>
> >> This is an XY problem IMO. Please describe exactly what
> >> you're trying to accomplish, with examples rather than
> >> continue to pursue this path. It sounds like you want
> >> spellcheck or similar. The _point_ behind the
> >> suggesters is that they handle multiple-word suggestions
> >> by returning he whole field. So putting long text fields
> >> into them is not going to work.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
> >>  wrote:
> >> > in line :
> >> >
> >> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
> >> >
> >> >> Thanks Benedetti,
> >> >>
> >> >> I've change to the AnalyzingInfixLookup approach, and it is able to
> >> start
> >> >> searching from the middle of the field.
> >> >>
> >> >> However, is it possible to make the suggester to show only part of
> the
> >> >> content of the field (like 2 or 3 fields after), instead of the
> entire
> >> >> content/sentence, which can be quite long?
> >> >>
> >> >
> >> > I assume you use "fields" in the place of tokens.
> >> > The answer is yes, I already said that in my previous mail, I invite
> you
> >> to
> >> > read carefully the answers and the documentation linked !
> >> >
> >> > Related the excessive dimensions of tokens. This is weird, what are
> you
> >> > trying to autocomplete ?
> >> > I really doubt would be useful for a user to see super long auto
> >> completed
> >> > terms.
> >> >
> >> > Cheers
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >> >>
> >> >>
> >> >> On 15 June 2015 at 17:33, Alessandro Benedetti <
> >> benedetti.ale...@gmail.com
> >> >> >
> >> >> wrote:
> >> >>
> >> >> > ehehe Edwin, I think you should read again the document I linked
> time
> >> >> ago :
> >> >> >
> >> >> > http://lucidworks.com/blog/solr-suggester/
> >> >> >
> >> >> > The suggester you used is not meant to provide infix suggestions.
> >> >> > The fuzzy suggester is working on a fuzzy basis , with the
> *starting*
> >> >> terms
> >> >> > of a field content.
> >> >> >
> >> >> > What you are looking for is actually one of the Infix Suggesters.
> >> >> > For example the AnalyzingInfixLookup approach.
> >> >> >
> >> >> > When working with Suggesters is important first to make a
> distinction
> >> :
> >> >> >
> >> >> > 1) Returning the full content of the field ( analysisInfix or
> Fuzzy)
> >> >> >
> >> >> > 2) Returning token(s) ( Free Text Suggester)
> >> >> >
> >> >> > Then the second difference is :
> >> >> >
> >> >> > 1) Infix suggestions ( from the "middle" of the field content)
> >> >> > 2) Classic suggester ( from the beginning of the field content)
> >> >> >
> >> >> > Clarified that, will be quite simple to work with suggesters.
> >> >> >
> >> >> > Cheers
> >> >> >
> >> >> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>:
> >> >> >
> >> >> > > I've indexed a rich-text documents with the following content:
> >> >> > >
> >> >> > > Th

Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
Have you looked at spellchecker? Because that sound much more like
what you're asking about than suggester.

Spell checking is more what you're asking for, have you even looked at that
after it was suggested?

bq: Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence

This is what highlighting is built for.

Really, I recommend you take the time to do some familiarization with the
whole search space and Solr. The excellent book here:

http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8&qid=1434513284&sr=8-1&keywords=apache+solr&pebp=1434513287267&perid=0YRK508J0HJ1N3BAX20E

will give you the grounding you need to get the most out of Solr.

Best,
Erick

On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
 wrote:
> The long content is from when I tried to index PDF files. As some PDF files
> has alot of words in the content, it will lead to the *UTF8 encoding is
> longer than the max length 32766 error.*
>
> I think the problem is the content size of the PDF file exceed 32766
> characters?
>
> I'm trying to accomplish to be able to index documents that can be of any
> size (even those with very large contents), and build the suggester from
> there. Also, when I do a search, it shouldn't be returning whole fields,
> but just to return a portion of the sentence.
>
>
>
> Regards,
> Edwin
>
>
> On 16 June 2015 at 23:02, Erick Erickson  wrote:
>
>> The suggesters are built to return whole fields. You _might_
>> be able to add multiple fragments to a multiValued
>> entry and get fragments, I haven't tried that though
>> and I suspect that actually you'd get the same thing..
>>
>> This is an XY problem IMO. Please describe exactly what
>> you're trying to accomplish, with examples rather than
>> continue to pursue this path. It sounds like you want
>> spellcheck or similar. The _point_ behind the
>> suggesters is that they handle multiple-word suggestions
>> by returning he whole field. So putting long text fields
>> into them is not going to work.
>>
>> Best,
>> Erick
>>
>> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
>>  wrote:
>> > in line :
>> >
>> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
>> >
>> >> Thanks Benedetti,
>> >>
>> >> I've change to the AnalyzingInfixLookup approach, and it is able to
>> start
>> >> searching from the middle of the field.
>> >>
>> >> However, is it possible to make the suggester to show only part of the
>> >> content of the field (like 2 or 3 fields after), instead of the entire
>> >> content/sentence, which can be quite long?
>> >>
>> >
>> > I assume you use "fields" in the place of tokens.
>> > The answer is yes, I already said that in my previous mail, I invite you
>> to
>> > read carefully the answers and the documentation linked !
>> >
>> > Related the excessive dimensions of tokens. This is weird, what are you
>> > trying to autocomplete ?
>> > I really doubt would be useful for a user to see super long auto
>> completed
>> > terms.
>> >
>> > Cheers
>> >
>> >>
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>> >>
>> >>
>> >> On 15 June 2015 at 17:33, Alessandro Benedetti <
>> benedetti.ale...@gmail.com
>> >> >
>> >> wrote:
>> >>
>> >> > ehehe Edwin, I think you should read again the document I linked time
>> >> ago :
>> >> >
>> >> > http://lucidworks.com/blog/solr-suggester/
>> >> >
>> >> > The suggester you used is not meant to provide infix suggestions.
>> >> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
>> >> terms
>> >> > of a field content.
>> >> >
>> >> > What you are looking for is actually one of the Infix Suggesters.
>> >> > For example the AnalyzingInfixLookup approach.
>> >> >
>> >> > When working with Suggesters is important first to make a distinction
>> :
>> >> >
>> >> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
>> >> >
>> >> > 2) Returning token(s) ( Free Text Suggester)
>> >> >
>> >> > Then the second difference is :
>> >> >
>> >> > 1) Infix suggestions ( from the "middle" of the field content)
>> >> > 2) Classic suggester ( from the beginning of the field content)
>> >> >
>> >> > Clarified that, will be quite simple to work with suggesters.
>> >> >
>> >> > Cheers
>> >> >
>> >> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
>> >> >
>> >> > > I've indexed a rich-text documents with the following content:
>> >> > >
>> >> > > This is a testing rich text documents to test the uploading of
>> files to
>> >> > > Solr
>> >> > >
>> >> > >
>> >> > > When I tried to use the suggestion, it return me the entire field in
>> >> the
>> >> > > content once I enter suggest?q=t. However, when I tried to search
>> for
>> >> > > q='rich', I don't get any results returned.
>> >> > >
>> >> > > This is my current configuration for the suggester:
>> >> > > 
>> >> > >   
>> >> > > mySuggester
>> >> > > FuzzyLookupFactory
>> >> > > DocumentDictionaryFactory
>> >> > > Suggestion
>> >> > > suggestType
>> >> > > true
>> >> > > false
>> >> > 

Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
The long content is from when I tried to index PDF files. As some PDF files
has alot of words in the content, it will lead to the *UTF8 encoding is
longer than the max length 32766 error.*

I think the problem is the content size of the PDF file exceed 32766
characters?

I'm trying to accomplish to be able to index documents that can be of any
size (even those with very large contents), and build the suggester from
there. Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence.



Regards,
Edwin


On 16 June 2015 at 23:02, Erick Erickson  wrote:

> The suggesters are built to return whole fields. You _might_
> be able to add multiple fragments to a multiValued
> entry and get fragments, I haven't tried that though
> and I suspect that actually you'd get the same thing..
>
> This is an XY problem IMO. Please describe exactly what
> you're trying to accomplish, with examples rather than
> continue to pursue this path. It sounds like you want
> spellcheck or similar. The _point_ behind the
> suggesters is that they handle multiple-word suggestions
> by returning he whole field. So putting long text fields
> into them is not going to work.
>
> Best,
> Erick
>
> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
>  wrote:
> > in line :
> >
> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> >> Thanks Benedetti,
> >>
> >> I've change to the AnalyzingInfixLookup approach, and it is able to
> start
> >> searching from the middle of the field.
> >>
> >> However, is it possible to make the suggester to show only part of the
> >> content of the field (like 2 or 3 fields after), instead of the entire
> >> content/sentence, which can be quite long?
> >>
> >
> > I assume you use "fields" in the place of tokens.
> > The answer is yes, I already said that in my previous mail, I invite you
> to
> > read carefully the answers and the documentation linked !
> >
> > Related the excessive dimensions of tokens. This is weird, what are you
> > trying to autocomplete ?
> > I really doubt would be useful for a user to see super long auto
> completed
> > terms.
> >
> > Cheers
> >
> >>
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >>
> >> On 15 June 2015 at 17:33, Alessandro Benedetti <
> benedetti.ale...@gmail.com
> >> >
> >> wrote:
> >>
> >> > ehehe Edwin, I think you should read again the document I linked time
> >> ago :
> >> >
> >> > http://lucidworks.com/blog/solr-suggester/
> >> >
> >> > The suggester you used is not meant to provide infix suggestions.
> >> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
> >> terms
> >> > of a field content.
> >> >
> >> > What you are looking for is actually one of the Infix Suggesters.
> >> > For example the AnalyzingInfixLookup approach.
> >> >
> >> > When working with Suggesters is important first to make a distinction
> :
> >> >
> >> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
> >> >
> >> > 2) Returning token(s) ( Free Text Suggester)
> >> >
> >> > Then the second difference is :
> >> >
> >> > 1) Infix suggestions ( from the "middle" of the field content)
> >> > 2) Classic suggester ( from the beginning of the field content)
> >> >
> >> > Clarified that, will be quite simple to work with suggesters.
> >> >
> >> > Cheers
> >> >
> >> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
> >> >
> >> > > I've indexed a rich-text documents with the following content:
> >> > >
> >> > > This is a testing rich text documents to test the uploading of
> files to
> >> > > Solr
> >> > >
> >> > >
> >> > > When I tried to use the suggestion, it return me the entire field in
> >> the
> >> > > content once I enter suggest?q=t. However, when I tried to search
> for
> >> > > q='rich', I don't get any results returned.
> >> > >
> >> > > This is my current configuration for the suggester:
> >> > > 
> >> > >   
> >> > > mySuggester
> >> > > FuzzyLookupFactory
> >> > > DocumentDictionaryFactory
> >> > > Suggestion
> >> > > suggestType
> >> > > true
> >> > > false
> >> > >   
> >> > > 
> >> > >
> >> > >  >> > startup="lazy" >
> >> > >   
> >> > > json
> >> > > true
> >> > >
> >> > > true
> >> > > 10
> >> > > mySuggester
> >> > >   
> >> > >   
> >> > > suggest
> >> > >   
> >> > > 
> >> > >
> >> > > Is it possible to allow the suggester to return something even from
> the
> >> > > middle of the sentence, and also not to return the entire sentence
> if
> >> the
> >> > > sentence. Perhaps it should just suggest the next 2 or 3 fields,
> and to
> >> > > return more fields as the users type.
> >> > >
> >> > > For example,
> >> > > When user type 'this', it should return 'This is a testing'
> >> > > When user type 'this is a testing', it should return 'This is a
> testing
> >> > > rich text documents'.
> >> > >
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > --
> >> >
> >> > Benedetti Alessandro
> >> > Visiting card : http://about.me/a

Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
The suggesters are built to return whole fields. You _might_
be able to add multiple fragments to a multiValued
entry and get fragments, I haven't tried that though
and I suspect that actually you'd get the same thing..

This is an XY problem IMO. Please describe exactly what
you're trying to accomplish, with examples rather than
continue to pursue this path. It sounds like you want
spellcheck or similar. The _point_ behind the
suggesters is that they handle multiple-word suggestions
by returning he whole field. So putting long text fields
into them is not going to work.

Best,
Erick

On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
 wrote:
> in line :
>
> 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
>
>> Thanks Benedetti,
>>
>> I've change to the AnalyzingInfixLookup approach, and it is able to start
>> searching from the middle of the field.
>>
>> However, is it possible to make the suggester to show only part of the
>> content of the field (like 2 or 3 fields after), instead of the entire
>> content/sentence, which can be quite long?
>>
>
> I assume you use "fields" in the place of tokens.
> The answer is yes, I already said that in my previous mail, I invite you to
> read carefully the answers and the documentation linked !
>
> Related the excessive dimensions of tokens. This is weird, what are you
> trying to autocomplete ?
> I really doubt would be useful for a user to see super long auto completed
> terms.
>
> Cheers
>
>>
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On 15 June 2015 at 17:33, Alessandro Benedetti > >
>> wrote:
>>
>> > ehehe Edwin, I think you should read again the document I linked time
>> ago :
>> >
>> > http://lucidworks.com/blog/solr-suggester/
>> >
>> > The suggester you used is not meant to provide infix suggestions.
>> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
>> terms
>> > of a field content.
>> >
>> > What you are looking for is actually one of the Infix Suggesters.
>> > For example the AnalyzingInfixLookup approach.
>> >
>> > When working with Suggesters is important first to make a distinction :
>> >
>> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
>> >
>> > 2) Returning token(s) ( Free Text Suggester)
>> >
>> > Then the second difference is :
>> >
>> > 1) Infix suggestions ( from the "middle" of the field content)
>> > 2) Classic suggester ( from the beginning of the field content)
>> >
>> > Clarified that, will be quite simple to work with suggesters.
>> >
>> > Cheers
>> >
>> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
>> >
>> > > I've indexed a rich-text documents with the following content:
>> > >
>> > > This is a testing rich text documents to test the uploading of files to
>> > > Solr
>> > >
>> > >
>> > > When I tried to use the suggestion, it return me the entire field in
>> the
>> > > content once I enter suggest?q=t. However, when I tried to search for
>> > > q='rich', I don't get any results returned.
>> > >
>> > > This is my current configuration for the suggester:
>> > > 
>> > >   
>> > > mySuggester
>> > > FuzzyLookupFactory
>> > > DocumentDictionaryFactory
>> > > Suggestion
>> > > suggestType
>> > > true
>> > > false
>> > >   
>> > > 
>> > >
>> > > > > startup="lazy" >
>> > >   
>> > > json
>> > > true
>> > >
>> > > true
>> > > 10
>> > > mySuggester
>> > >   
>> > >   
>> > > suggest
>> > >   
>> > > 
>> > >
>> > > Is it possible to allow the suggester to return something even from the
>> > > middle of the sentence, and also not to return the entire sentence if
>> the
>> > > sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
>> > > return more fields as the users type.
>> > >
>> > > For example,
>> > > When user type 'this', it should return 'This is a testing'
>> > > When user type 'this is a testing', it should return 'This is a testing
>> > > rich text documents'.
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> >
>> >
>> >
>> > --
>> > --
>> >
>> > Benedetti Alessandro
>> > Visiting card : http://about.me/alessandro_benedetti
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>> >
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Solr's suggester results

2015-06-16 Thread Alessandro Benedetti
in line :

2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :

> Thanks Benedetti,
>
> I've change to the AnalyzingInfixLookup approach, and it is able to start
> searching from the middle of the field.
>
> However, is it possible to make the suggester to show only part of the
> content of the field (like 2 or 3 fields after), instead of the entire
> content/sentence, which can be quite long?
>

I assume you use "fields" in the place of tokens.
The answer is yes, I already said that in my previous mail, I invite you to
read carefully the answers and the documentation linked !

Related the excessive dimensions of tokens. This is weird, what are you
trying to autocomplete ?
I really doubt would be useful for a user to see super long auto completed
terms.

Cheers

>
>
> Regards,
> Edwin
>
>
>
> On 15 June 2015 at 17:33, Alessandro Benedetti  >
> wrote:
>
> > ehehe Edwin, I think you should read again the document I linked time
> ago :
> >
> > http://lucidworks.com/blog/solr-suggester/
> >
> > The suggester you used is not meant to provide infix suggestions.
> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
> terms
> > of a field content.
> >
> > What you are looking for is actually one of the Infix Suggesters.
> > For example the AnalyzingInfixLookup approach.
> >
> > When working with Suggesters is important first to make a distinction :
> >
> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
> >
> > 2) Returning token(s) ( Free Text Suggester)
> >
> > Then the second difference is :
> >
> > 1) Infix suggestions ( from the "middle" of the field content)
> > 2) Classic suggester ( from the beginning of the field content)
> >
> > Clarified that, will be quite simple to work with suggesters.
> >
> > Cheers
> >
> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> > > I've indexed a rich-text documents with the following content:
> > >
> > > This is a testing rich text documents to test the uploading of files to
> > > Solr
> > >
> > >
> > > When I tried to use the suggestion, it return me the entire field in
> the
> > > content once I enter suggest?q=t. However, when I tried to search for
> > > q='rich', I don't get any results returned.
> > >
> > > This is my current configuration for the suggester:
> > > 
> > >   
> > > mySuggester
> > > FuzzyLookupFactory
> > > DocumentDictionaryFactory
> > > Suggestion
> > > suggestType
> > > true
> > > false
> > >   
> > > 
> > >
> > >  > startup="lazy" >
> > >   
> > > json
> > > true
> > >
> > > true
> > > 10
> > > mySuggester
> > >   
> > >   
> > > suggest
> > >   
> > > 
> > >
> > > Is it possible to allow the suggester to return something even from the
> > > middle of the sentence, and also not to return the entire sentence if
> the
> > > sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
> > > return more fields as the users type.
> > >
> > > For example,
> > > When user type 'this', it should return 'This is a testing'
> > > When user type 'this is a testing', it should return 'This is a testing
> > > rich text documents'.
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr's suggester results

2015-06-15 Thread Zheng Lin Edwin Yeo
Also, is there a way to overcome the long content problem?

I'm getting this error when I've indexed large rich-text documents and
tried to build the suggester.

*{*
*  "responseHeader":{*
*"status":500,*
*"QTime":47},*
*  "error":{*
*"msg":"Document contains at least one immense term in
field=\"exacttext\" (whose UTF8 encoding is longer than the max length
32766), all of which were skipped.  Please correct the analyzer to not
produce such terms.  The prefix of the first immense term is: '[32, 10, 32,
10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10,
32, 32, 10, 32, 32, 10, 32, 32]...', original message: bytes can be at most
32766 in length; got 139402",*
*"trace":"java.lang.IllegalArgumentException: Document contains at
least one immense term in field=\"exacttext\" (whose UTF8 encoding is
longer than the max length 32766), all of which were skipped.  Please
correct the analyzer to not produce such terms.  The prefix of the first
immense term is: '[32, 10, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32,
32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32]...',
original message: bytes can be at most 32766 in length; got 139402\r\n\tat
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)\r\n\tat
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)\r\n\tat
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)\r\n\tat
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)\r\n\tat
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)\r\n\tat
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350)\r\n\tat
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1138)\r\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.add(AnalyzingInfixSuggester.java:381)\r\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:310)\r\n\tat
org.apache.lucene.search.suggest.Lookup.build(Lookup.java:193)\r\n\tat
org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:163)\r\n\tat
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:179)\r\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\r\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)\r\n\tat
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)\r\n\tat
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:368)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\r\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(Abstra

Re: Solr's suggester results

2015-06-15 Thread Zheng Lin Edwin Yeo
Thanks Benedetti,

I've change to the AnalyzingInfixLookup approach, and it is able to start
searching from the middle of the field.

However, is it possible to make the suggester to show only part of the
content of the field (like 2 or 3 fields after), instead of the entire
content/sentence, which can be quite long?


Regards,
Edwin



On 15 June 2015 at 17:33, Alessandro Benedetti 
wrote:

> ehehe Edwin, I think you should read again the document I linked time ago :
>
> http://lucidworks.com/blog/solr-suggester/
>
> The suggester you used is not meant to provide infix suggestions.
> The fuzzy suggester is working on a fuzzy basis , with the *starting* terms
> of a field content.
>
> What you are looking for is actually one of the Infix Suggesters.
> For example the AnalyzingInfixLookup approach.
>
> When working with Suggesters is important first to make a distinction :
>
> 1) Returning the full content of the field ( analysisInfix or Fuzzy)
>
> 2) Returning token(s) ( Free Text Suggester)
>
> Then the second difference is :
>
> 1) Infix suggestions ( from the "middle" of the field content)
> 2) Classic suggester ( from the beginning of the field content)
>
> Clarified that, will be quite simple to work with suggesters.
>
> Cheers
>
> 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > I've indexed a rich-text documents with the following content:
> >
> > This is a testing rich text documents to test the uploading of files to
> > Solr
> >
> >
> > When I tried to use the suggestion, it return me the entire field in the
> > content once I enter suggest?q=t. However, when I tried to search for
> > q='rich', I don't get any results returned.
> >
> > This is my current configuration for the suggester:
> > 
> >   
> > mySuggester
> > FuzzyLookupFactory
> > DocumentDictionaryFactory
> > Suggestion
> > suggestType
> > true
> > false
> >   
> > 
> >
> >  startup="lazy" >
> >   
> > json
> > true
> >
> > true
> > 10
> > mySuggester
> >   
> >   
> > suggest
> >   
> > 
> >
> > Is it possible to allow the suggester to return something even from the
> > middle of the sentence, and also not to return the entire sentence if the
> > sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
> > return more fields as the users type.
> >
> > For example,
> > When user type 'this', it should return 'This is a testing'
> > When user type 'this is a testing', it should return 'This is a testing
> > rich text documents'.
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Solr's suggester results

2015-06-15 Thread Alessandro Benedetti
ehehe Edwin, I think you should read again the document I linked time ago :

http://lucidworks.com/blog/solr-suggester/

The suggester you used is not meant to provide infix suggestions.
The fuzzy suggester is working on a fuzzy basis , with the *starting* terms
of a field content.

What you are looking for is actually one of the Infix Suggesters.
For example the AnalyzingInfixLookup approach.

When working with Suggesters is important first to make a distinction :

1) Returning the full content of the field ( analysisInfix or Fuzzy)

2) Returning token(s) ( Free Text Suggester)

Then the second difference is :

1) Infix suggestions ( from the "middle" of the field content)
2) Classic suggester ( from the beginning of the field content)

Clarified that, will be quite simple to work with suggesters.

Cheers

2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :

> I've indexed a rich-text documents with the following content:
>
> This is a testing rich text documents to test the uploading of files to
> Solr
>
>
> When I tried to use the suggestion, it return me the entire field in the
> content once I enter suggest?q=t. However, when I tried to search for
> q='rich', I don't get any results returned.
>
> This is my current configuration for the suggester:
> 
>   
> mySuggester
> FuzzyLookupFactory
> DocumentDictionaryFactory
> Suggestion
> suggestType
> true
> false
>   
> 
>
> 
>   
> json
> true
>
> true
> 10
> mySuggester
>   
>   
> suggest
>   
> 
>
> Is it possible to allow the suggester to return something even from the
> middle of the sentence, and also not to return the entire sentence if the
> sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
> return more fields as the users type.
>
> For example,
> When user type 'this', it should return 'This is a testing'
> When user type 'this is a testing', it should return 'This is a testing
> rich text documents'.
>
>
> Regards,
> Edwin
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England