Re: Search Suggestion Filtering
Regarding LUCENE-5350, the context is the filter. i.e. the context is prefixed with every entry (suggestion) that is in the context. So when users lookup foo entry in context of bar, the actual lookup is bar(ctx_seperator)foo. This filters entries that match foo in another context in the lookup. For details on use-cases see the description of the jira. Regarding SOLR-5378, the final documentation is unfortunately unavailable as of now (will try to fix that ASAP). For now, you can look at the description of SOLR-5378 (outdated response format) along with the description of SOLR-5529 (updated response format). I have linked the related JIRAs in SOLR-5378. Hope that helps, Areek On Mon, Jan 20, 2014 at 2:23 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi guys, following this thread I have some question : 1) regarding LUCENE-5350, what is the context quoted ? Is it the context a filter query ? 2) regarding https://issues.apache.org/jira/browse/SOLR-5378, do we have the final documentation available ? Cheers 2014/1/16 Hamish Campbell hamish.campb...@koordinates.com Thank you Jorge. We looked at phrase suggestions from previous user queries, but they're not so useful in our case. However, I have a follow-up question about similar functionality that I'll post shortly. The list might like to know that I've come up with a quick and exceedingly dirty strikehack/strike solution that works for our limited case. You have been warned! Note that we're using django-haystack to actually interact with Solr: 1. Set nonFuzzyPrefix of the Suggester to 4. 2. At index time, the haystack index will build suggestion terms by extracting the relevant terms and prefixing with a 4 (alpha) character reference for the target instance. 3. At search time, the user's query is split, terms are prefixed and concatenated. The new query is sent to solr and the results are cleaned of references before returned to the front end. I'm not proud of it, but it works. =D On Fri, Jan 17, 2014 at 3:13 AM, Jorge Luis Betancourt González jlbetanco...@uci.cu wrote: In a custom application we have, we use a separated core (under Solr 3.6.1) to store the queries used by the users and then provide the autocomplete feauture. In our case we need to filter some phrases, that we don't need to be suggested to the users. I build a custom UpdateRequestProcessor to implement this logic, so we define this blocking patterns in some external source of information (DB, files, etc.). For the suggestions per-se we use as a base https://github.com/cominvent/autocomplete configuration, described in www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ which is pretty usable as it comes. I found (personally) this approach way more flexible than the original suggester component, but it involves storing the user's queries into a separated core. Greetings, - Original Message - From: Hamish Campbell hamish.campb...@koordinates.com To: solr-user@lucene.apache.org Sent: Wednesday, January 15, 2014 9:10:16 PM Subject: Re: Search Suggestion Filtering Thanks Tomás, I'll take a look. Still interested to hear from anyone about using queries to populate the list - I'm willing to give up a bit of performance for the flexibility it would provide. On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think your use case is the one described in LUCENE-5350, maybe you want to take a look to the patch and comments there. Tomás On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run
Re: Search Suggestion Filtering
Hi guys, following this thread I have some question : 1) regarding LUCENE-5350, what is the context quoted ? Is it the context a filter query ? 2) regarding https://issues.apache.org/jira/browse/SOLR-5378, do we have the final documentation available ? Cheers 2014/1/16 Hamish Campbell hamish.campb...@koordinates.com Thank you Jorge. We looked at phrase suggestions from previous user queries, but they're not so useful in our case. However, I have a follow-up question about similar functionality that I'll post shortly. The list might like to know that I've come up with a quick and exceedingly dirty strikehack/strike solution that works for our limited case. You have been warned! Note that we're using django-haystack to actually interact with Solr: 1. Set nonFuzzyPrefix of the Suggester to 4. 2. At index time, the haystack index will build suggestion terms by extracting the relevant terms and prefixing with a 4 (alpha) character reference for the target instance. 3. At search time, the user's query is split, terms are prefixed and concatenated. The new query is sent to solr and the results are cleaned of references before returned to the front end. I'm not proud of it, but it works. =D On Fri, Jan 17, 2014 at 3:13 AM, Jorge Luis Betancourt González jlbetanco...@uci.cu wrote: In a custom application we have, we use a separated core (under Solr 3.6.1) to store the queries used by the users and then provide the autocomplete feauture. In our case we need to filter some phrases, that we don't need to be suggested to the users. I build a custom UpdateRequestProcessor to implement this logic, so we define this blocking patterns in some external source of information (DB, files, etc.). For the suggestions per-se we use as a base https://github.com/cominvent/autocomplete configuration, described in www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ which is pretty usable as it comes. I found (personally) this approach way more flexible than the original suggester component, but it involves storing the user's queries into a separated core. Greetings, - Original Message - From: Hamish Campbell hamish.campb...@koordinates.com To: solr-user@lucene.apache.org Sent: Wednesday, January 15, 2014 9:10:16 PM Subject: Re: Search Suggestion Filtering Thanks Tomás, I'll take a look. Still interested to hear from anyone about using queries to populate the list - I'm willing to give up a bit of performance for the flexibility it would provide. On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think your use case is the one described in LUCENE-5350, maybe you want to take a look to the patch and comments there. Tomás On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run a query to populate the suggestion list (also mentioned in that thread) If I understand this correctly, this would give us a lot of flexibility and power: for example to give a more nuanced result set using the users permissions to expose private documents in their spelling suggestions. I expect this would be a slow query, but our total document count is currently relatively small (on the order of 10^3 objects) and I imagine you could create a specific word index with the appropriate fields to keep this in check. Is this a feasible approach, and if so, how do you build a dynamic suggestion list? 3. Other options: It seems like this is a common problem - and we could through some resources at building an extension to provide some limited suggestion dictionary filtering. Is anyone already doing something similar
Re: Search Suggestion Filtering
In a custom application we have, we use a separated core (under Solr 3.6.1) to store the queries used by the users and then provide the autocomplete feauture. In our case we need to filter some phrases, that we don't need to be suggested to the users. I build a custom UpdateRequestProcessor to implement this logic, so we define this blocking patterns in some external source of information (DB, files, etc.). For the suggestions per-se we use as a base https://github.com/cominvent/autocomplete configuration, described in www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ which is pretty usable as it comes. I found (personally) this approach way more flexible than the original suggester component, but it involves storing the user's queries into a separated core. Greetings, - Original Message - From: Hamish Campbell hamish.campb...@koordinates.com To: solr-user@lucene.apache.org Sent: Wednesday, January 15, 2014 9:10:16 PM Subject: Re: Search Suggestion Filtering Thanks Tomás, I'll take a look. Still interested to hear from anyone about using queries to populate the list - I'm willing to give up a bit of performance for the flexibility it would provide. On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think your use case is the one described in LUCENE-5350, maybe you want to take a look to the patch and comments there. Tomás On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run a query to populate the suggestion list (also mentioned in that thread) If I understand this correctly, this would give us a lot of flexibility and power: for example to give a more nuanced result set using the users permissions to expose private documents in their spelling suggestions. I expect this would be a slow query, but our total document count is currently relatively small (on the order of 10^3 objects) and I imagine you could create a specific word index with the appropriate fields to keep this in check. Is this a feasible approach, and if so, how do you build a dynamic suggestion list? 3. Other options: It seems like this is a common problem - and we could through some resources at building an extension to provide some limited suggestion dictionary filtering. Is anyone already doing something similar, or has found a clever hack around this, or can suggest a starting point? Thanks everyone! -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045 -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045 III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Search Suggestion Filtering
Thank you Jorge. We looked at phrase suggestions from previous user queries, but they're not so useful in our case. However, I have a follow-up question about similar functionality that I'll post shortly. The list might like to know that I've come up with a quick and exceedingly dirty strikehack/strike solution that works for our limited case. You have been warned! Note that we're using django-haystack to actually interact with Solr: 1. Set nonFuzzyPrefix of the Suggester to 4. 2. At index time, the haystack index will build suggestion terms by extracting the relevant terms and prefixing with a 4 (alpha) character reference for the target instance. 3. At search time, the user's query is split, terms are prefixed and concatenated. The new query is sent to solr and the results are cleaned of references before returned to the front end. I'm not proud of it, but it works. =D On Fri, Jan 17, 2014 at 3:13 AM, Jorge Luis Betancourt González jlbetanco...@uci.cu wrote: In a custom application we have, we use a separated core (under Solr 3.6.1) to store the queries used by the users and then provide the autocomplete feauture. In our case we need to filter some phrases, that we don't need to be suggested to the users. I build a custom UpdateRequestProcessor to implement this logic, so we define this blocking patterns in some external source of information (DB, files, etc.). For the suggestions per-se we use as a base https://github.com/cominvent/autocomplete configuration, described in www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ which is pretty usable as it comes. I found (personally) this approach way more flexible than the original suggester component, but it involves storing the user's queries into a separated core. Greetings, - Original Message - From: Hamish Campbell hamish.campb...@koordinates.com To: solr-user@lucene.apache.org Sent: Wednesday, January 15, 2014 9:10:16 PM Subject: Re: Search Suggestion Filtering Thanks Tomás, I'll take a look. Still interested to hear from anyone about using queries to populate the list - I'm willing to give up a bit of performance for the flexibility it would provide. On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think your use case is the one described in LUCENE-5350, maybe you want to take a look to the patch and comments there. Tomás On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run a query to populate the suggestion list (also mentioned in that thread) If I understand this correctly, this would give us a lot of flexibility and power: for example to give a more nuanced result set using the users permissions to expose private documents in their spelling suggestions. I expect this would be a slow query, but our total document count is currently relatively small (on the order of 10^3 objects) and I imagine you could create a specific word index with the appropriate fields to keep this in check. Is this a feasible approach, and if so, how do you build a dynamic suggestion list? 3. Other options: It seems like this is a common problem - and we could through some resources at building an extension to provide some limited suggestion dictionary filtering. Is anyone already doing something similar, or has found a clever hack around this, or can suggest a starting point? Thanks everyone! -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045 -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045 III
Search Suggestion Filtering
Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run a query to populate the suggestion list (also mentioned in that thread) If I understand this correctly, this would give us a lot of flexibility and power: for example to give a more nuanced result set using the users permissions to expose private documents in their spelling suggestions. I expect this would be a slow query, but our total document count is currently relatively small (on the order of 10^3 objects) and I imagine you could create a specific word index with the appropriate fields to keep this in check. Is this a feasible approach, and if so, how do you build a dynamic suggestion list? 3. Other options: It seems like this is a common problem - and we could through some resources at building an extension to provide some limited suggestion dictionary filtering. Is anyone already doing something similar, or has found a clever hack around this, or can suggest a starting point? Thanks everyone! -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045
Re: Search Suggestion Filtering
I think your use case is the one described in LUCENE-5350, maybe you want to take a look to the patch and comments there. Tomás On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run a query to populate the suggestion list (also mentioned in that thread) If I understand this correctly, this would give us a lot of flexibility and power: for example to give a more nuanced result set using the users permissions to expose private documents in their spelling suggestions. I expect this would be a slow query, but our total document count is currently relatively small (on the order of 10^3 objects) and I imagine you could create a specific word index with the appropriate fields to keep this in check. Is this a feasible approach, and if so, how do you build a dynamic suggestion list? 3. Other options: It seems like this is a common problem - and we could through some resources at building an extension to provide some limited suggestion dictionary filtering. Is anyone already doing something similar, or has found a clever hack around this, or can suggest a starting point? Thanks everyone! -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045
Re: Search Suggestion Filtering
Thanks Tomás, I'll take a look. Still interested to hear from anyone about using queries to populate the list - I'm willing to give up a bit of performance for the flexibility it would provide. On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think your use case is the one described in LUCENE-5350, maybe you want to take a look to the patch and comments there. Tomás On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run a query to populate the suggestion list (also mentioned in that thread) If I understand this correctly, this would give us a lot of flexibility and power: for example to give a more nuanced result set using the users permissions to expose private documents in their spelling suggestions. I expect this would be a slow query, but our total document count is currently relatively small (on the order of 10^3 objects) and I imagine you could create a specific word index with the appropriate fields to keep this in check. Is this a feasible approach, and if so, how do you build a dynamic suggestion list? 3. Other options: It seems like this is a common problem - and we could through some resources at building an extension to provide some limited suggestion dictionary filtering. Is anyone already doing something similar, or has found a clever hack around this, or can suggest a starting point? Thanks everyone! -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045 -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045
Re: Search Suggestion Filtering
Hey Hamish, You might want to check this out LUCENE-5402 . I added support for index-time pruning for suggesters that consumes from the index itself. I plan to add this support to file-based suggesters as well. In order to use this functionality from Solr, more changes are required. I am planning to support this in the new SuggesterComponent (SOLR-5378) in Solr. Hope that helps! Areek On Wed, Jan 15, 2014 at 6:10 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Thanks Tomás, I'll take a look. Still interested to hear from anyone about using queries to populate the list - I'm willing to give up a bit of performance for the flexibility it would provide. On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think your use case is the one described in LUCENE-5350, maybe you want to take a look to the patch and comments there. Tomás On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell hamish.campb...@koordinates.com wrote: Hi all, I'm looking into options for filtering the search suggestions dictionary. Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a field based dictionary, we're indexing records for a multi-tenanted SaaS platform. SearchHandler records are always filtered by the particular client warehouse (e.g. by domain), however we need a way to apply a similar filter to the spell check dictionary to prevent leaking terms between clients. In other words: when client A searches for a document title they should not receive spelling suggestions for client B's document titles. This has been asked a couple of times, on the mailing list and on StackOverflow. Some of the suggested approaches: 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here: http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html ) That might be a reasonable option for us (we already considered a similar approach), but at what point does this stop scaling efficiently? How many dynamic fields are too many? 2. Run a query to populate the suggestion list (also mentioned in that thread) If I understand this correctly, this would give us a lot of flexibility and power: for example to give a more nuanced result set using the users permissions to expose private documents in their spelling suggestions. I expect this would be a slow query, but our total document count is currently relatively small (on the order of 10^3 objects) and I imagine you could create a specific word index with the appropriate fields to keep this in check. Is this a feasible approach, and if so, how do you build a dynamic suggestion list? 3. Other options: It seems like this is a common problem - and we could through some resources at building an extension to provide some limited suggestion dictionary filtering. Is anyone already doing something similar, or has found a clever hack around this, or can suggest a starting point? Thanks everyone! -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045 -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045