Re: Autosuggest on very large index

2013-08-20 Thread Greg Preston
DocValues looks interesting, a non-inverted field.  I'll play with it
a bit and see how it works.  Thanks for the suggestion.

I don't know how many total terms we've got, but each "document" is
only 2-5 words/terms on average, and there is a TON of overlap between
docs.



-Greg


On Tue, Aug 20, 2013 at 11:38 AM, Jack Krupansky
 wrote:
> Sounds like a problem for DocValues - assuming the number of unique values
> fits reasonably in memory to avoid I/O.
>
> How many unique values do you have or contemplate for two your billion
> documents?
>
> Two possibilities:
>
> 1. You need a lot more hardware.
> 2. You need to scale back your ambitions.
>
> -- Jack Krupansky
>
> -Original Message- From: Greg Preston
> Sent: Tuesday, August 20, 2013 2:00 PM
>
> To: solr-user@lucene.apache.org
> Subject: Autosuggest on very large index
>
> Using 4.4.0 -
>
> I would like to be able to do an autosuggest query against one of the
> fields in our index and have the results be limited by an fq.
>
> I can get exactly the results I want with a facet query using a
> facet.prefix, but the first query takes ~5 minutes to run on our QA
> env (~240M docs).  I'm afraid to attempt it on prod (~2B docs).
> Subsequent queries are sufficiently fast (~500ms).
>
> I'm assuming the first query is uninverting the field.  Is there any
> way to mark that field so that an uninverted copy is maintained as
> updates come in?  We plan to soft commit every 5 minutes, and we'd
> prefer to not be continuously uninverting this one field.
>
> Or is there a better way to do what I'm trying to do?  I've looked at
> the spellcheck component a little bit, but it looks like I can't
> filter results by fq.  The fq I'm using is based on which client is
> logged in, and we can't autosuggest terms from one client to another.
>
> Thanks.
>
> -Greg


Re: Autosuggest on very large index

2013-08-20 Thread Jack Krupansky
Sounds like a problem for DocValues - assuming the number of unique values 
fits reasonably in memory to avoid I/O.


How many unique values do you have or contemplate for two your billion 
documents?


Two possibilities:

1. You need a lot more hardware.
2. You need to scale back your ambitions.

-- Jack Krupansky

-Original Message- 
From: Greg Preston

Sent: Tuesday, August 20, 2013 2:00 PM
To: solr-user@lucene.apache.org
Subject: Autosuggest on very large index

Using 4.4.0 -

I would like to be able to do an autosuggest query against one of the
fields in our index and have the results be limited by an fq.

I can get exactly the results I want with a facet query using a
facet.prefix, but the first query takes ~5 minutes to run on our QA
env (~240M docs).  I'm afraid to attempt it on prod (~2B docs).
Subsequent queries are sufficiently fast (~500ms).

I'm assuming the first query is uninverting the field.  Is there any
way to mark that field so that an uninverted copy is maintained as
updates come in?  We plan to soft commit every 5 minutes, and we'd
prefer to not be continuously uninverting this one field.

Or is there a better way to do what I'm trying to do?  I've looked at
the spellcheck component a little bit, but it looks like I can't
filter results by fq.  The fq I'm using is based on which client is
logged in, and we can't autosuggest terms from one client to another.

Thanks.

-Greg 



Re: Autosuggest on very large index

2013-08-20 Thread Greg Preston
The filter query would be on a different field (clientId) than the
field we want to autosuggest on (title).

Or are you proposing we index a compound field that would be
clientId+titleTokens so we would then prefix the suggester with
clientId+userInput ?

Interesting idea.

-Greg


On Tue, Aug 20, 2013 at 11:21 AM, Markus Jelsma
 wrote:
> I am not entirely sure but the Suggester's FST uses prefixes so you may be 
> able to prefix the value you otherwise use for the filter query when you 
> build the suggester.
>
> -Original message-
>> From:Greg Preston 
>> Sent: Tuesday 20th August 2013 20:00
>> To: solr-user@lucene.apache.org
>> Subject: Autosuggest on very large index
>>
>> Using 4.4.0 -
>>
>> I would like to be able to do an autosuggest query against one of the
>> fields in our index and have the results be limited by an fq.
>>
>> I can get exactly the results I want with a facet query using a
>> facet.prefix, but the first query takes ~5 minutes to run on our QA
>> env (~240M docs).  I'm afraid to attempt it on prod (~2B docs).
>> Subsequent queries are sufficiently fast (~500ms).
>>
>> I'm assuming the first query is uninverting the field.  Is there any
>> way to mark that field so that an uninverted copy is maintained as
>> updates come in?  We plan to soft commit every 5 minutes, and we'd
>> prefer to not be continuously uninverting this one field.
>>
>> Or is there a better way to do what I'm trying to do?  I've looked at
>> the spellcheck component a little bit, but it looks like I can't
>> filter results by fq.  The fq I'm using is based on which client is
>> logged in, and we can't autosuggest terms from one client to another.
>>
>> Thanks.
>>
>> -Greg


RE: Autosuggest on very large index

2013-08-20 Thread Markus Jelsma
I am not entirely sure but the Suggester's FST uses prefixes so you may be able 
to prefix the value you otherwise use for the filter query when you build the 
suggester.
 
-Original message-
> From:Greg Preston 
> Sent: Tuesday 20th August 2013 20:00
> To: solr-user@lucene.apache.org
> Subject: Autosuggest on very large index
> 
> Using 4.4.0 -
> 
> I would like to be able to do an autosuggest query against one of the
> fields in our index and have the results be limited by an fq.
> 
> I can get exactly the results I want with a facet query using a
> facet.prefix, but the first query takes ~5 minutes to run on our QA
> env (~240M docs).  I'm afraid to attempt it on prod (~2B docs).
> Subsequent queries are sufficiently fast (~500ms).
> 
> I'm assuming the first query is uninverting the field.  Is there any
> way to mark that field so that an uninverted copy is maintained as
> updates come in?  We plan to soft commit every 5 minutes, and we'd
> prefer to not be continuously uninverting this one field.
> 
> Or is there a better way to do what I'm trying to do?  I've looked at
> the spellcheck component a little bit, but it looks like I can't
> filter results by fq.  The fq I'm using is based on which client is
> logged in, and we can't autosuggest terms from one client to another.
> 
> Thanks.
> 
> -Greg


Autosuggest on very large index

2013-08-20 Thread Greg Preston
Using 4.4.0 -

I would like to be able to do an autosuggest query against one of the
fields in our index and have the results be limited by an fq.

I can get exactly the results I want with a facet query using a
facet.prefix, but the first query takes ~5 minutes to run on our QA
env (~240M docs).  I'm afraid to attempt it on prod (~2B docs).
Subsequent queries are sufficiently fast (~500ms).

I'm assuming the first query is uninverting the field.  Is there any
way to mark that field so that an uninverted copy is maintained as
updates come in?  We plan to soft commit every 5 minutes, and we'd
prefer to not be continuously uninverting this one field.

Or is there a better way to do what I'm trying to do?  I've looked at
the spellcheck component a little bit, but it looks like I can't
filter results by fq.  The fq I'm using is based on which client is
logged in, and we can't autosuggest terms from one client to another.

Thanks.

-Greg