SOLR deduplication
Hi - I have the SOLR deduplication configured and working well. Is there any way I can tell which documents have been not added to the index as a result of the deduplication rejecting subsequent identical documents? Many Thanks Jason Brown. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Dismax score - maximu of any one field?
Can anyone tell me hoe the dismax score is computed? Is it the maximum score for any of the component fields that are searched? Thank You. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
De-duplication not working as I expected - duplicates still getting into the index
I have configured de-duplication according to the Wiki.. My signature field is defined thus... and my updateRequestProcessor as follows true false signature content org.apache.solr.update.processor.Lookup3Signature I am using SOLRJ to write to the index with the binary (as opposed to XML) so my update handler is defined as below. dedupe However I was expecting SOLR to only allow 1 instance of a duplicate document into the index, but I get the following results when I query mt index... I have deliberately added my ISA Letter file 4 times and can see it has correctly generated an identical signature for the first 4 entries (d91a5ce933457fd5). The fifth entry is a different document and correctly has a different signature. I was expecting to only see 1 instance of the duplicate. Am I misinterpreting the way it works? Many Thanks. ? ISA Letter d91a5ce933457fd5 ? ISA Letter d91a5ce933457fd5 ? ISA Letter d91a5ce933457fd5 ? ISA Letter d91a5ce933457fd5 ? ISA Mailing pack letter fd9d9e1c0de32fb5 If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: Boost on newer documents
Hi - you do understand may case - we tried what you suggested but as the relevancy is very precise we couldn't get it it to do a dual-sort. I like the idea of using one of the dismax parameters (bf) to in-effect increase the boost on a newer document. Thanks for all replies, most useful. -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Tue 30/11/2010 09:26 To: solr-user@lucene.apache.org Subject: Re: Boost on newer documents hi, I might not understand your case right but can you not add an extra publishedDate field and then specify a secondary (after relevance) sort by that? On 30 November 2010 08:05, wrote: > You could also put a short representation of the data (I suggest days since > 01.01.2010) as payload and calculate boost with payload function of the > similarity. > > >-Original Message----- > >From: ext Jason Brown [mailto:jason.br...@sjp.co.uk] > >Sent: Montag, 29. November 2010 17:28 > >To: solr-user@lucene.apache.org > >Subject: Boost on newer documents > > > > > >Hi, > > > >I use the dismax query to search across several fields. > > > >I find I have a lot of documents with the same document name (one of the > fields that the dismax queries) so I wanted to adjust the > >relevance so that titles with a newer published date have a higher > relevance than documents with the same title but are older. Does > >anyone know how I can achieve this? > > > >Thank You > > > >Jason. > > > >If you wish to view the St. James's Place email disclaimer, please use the > link below > > > >http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: Boost on newer documents
Great - Thank You. -Original Message- From: Mat Brown [mailto:m...@patch.com] Sent: Mon 29/11/2010 16:33 To: solr-user@lucene.apache.org Subject: Re: Boost on newer documents Hi Jason, You can use boost functions in the dismax handler to do this: http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29 Mat On Mon, Nov 29, 2010 at 11:28, Jason Brown wrote: > > Hi, > > I use the dismax query to search across several fields. > > I find I have a lot of documents with the same document name (one of the > fields that the dismax queries) so I wanted to adjust the relevance so that > titles with a newer published date have a higher relevance than documents > with the same title but are older. Does anyone know how I can achieve this? > > Thank You > > Jason. > > If you wish to view the St. James's Place email disclaimer, please use the > link below > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Boost on newer documents
Hi, I use the dismax query to search across several fields. I find I have a lot of documents with the same document name (one of the fields that the dismax queries) so I wanted to adjust the relevance so that titles with a newer published date have a higher relevance than documents with the same title but are older. Does anyone know how I can achieve this? Thank You Jason. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: Synonym Filtering on String Fields
Thanks Erick - I do exactly want multiple terms generated from my string field i.e. I want the single term fund manager summary to be turned into 2 terms > fund manager summary, fund manager report I want the single term guide to be turned into the 2 terms -> guide, product guide I am using term synonomoulsly with what will be in the index. (I appreciate the outputs of the synonym filter wont be stored per se, just added as terms to the index) The problem I was having is that I am doing this on a a field as I described below and was having problems with the multi-word terms, the behaviour is guide is getting turned into 3 terms guide, product, guide (3 terms, I only want 2, guide and product guide) fund manager summary and fund manager report were not having any impact on the synonym filter, the output was the same as the input. I need these as strings (I dont search on this field, its just for facetting), I have another text field which I do the search on. I will give Ahmet's comments a go. Thanks All. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Fri 26/11/2010 14:16 To: solr-user@lucene.apache.org Subject: Re: Synonym Filtering on String Fields Besides Ahmet's comments, I have to wonder if you want to do this in a single field? The problem is that you're expanding your synonyms into a field. Let's say you expand "memory" into "memory", "recall" and "RAM". Now you have three tokens in your field. What does faceting mean now? Perhaps you would be better off using the directive to make a field for faceting and use Solr text type for your searchable field? Of course this may be wy off base About your point (1), you say synonyms aren't getting picked up. You might be getting fooled by seeing the stored value. Look in the admin page under "schema browser" to see the terms in the index, which would have the synonyms. Just selecting the document via search will only show you the stored values which would NOT have the synonyms. Best Erick On Fri, Nov 26, 2010 at 5:15 AM, Jason Brown wrote: > > I have the following field type set up in my schema. The idea is to fire > phrases of text such as 'fund manager summary' (without the quotes) at it, > and for the synonym processing to recognise this, and add the rest of the > synonyms (index-time synonym processing with expansion) to the index from my > synonym file (example below) > > positionIncrementGap="100"> > > > ignoreCase="true" expand="true"/> > > > > > > > > in synonyms.txt. > > fund manager summary, fund manager report > guide, product guide > > I run into 2 issues... > > (1) After analysis of the field in SOLR, I find that both > > fund manager summay > fund manage report > > are NOT getting picked up by the synonym factory (after processing I just > get the source term outputted from the synonym filter) > > (2) If I analyse guide, I do get product and guide (*2) outputted from the > synonym filter factory - but as seperate terms (3 terms in total), I > expected it to generate just 1 additional term - i.e. product guide > > It seems that it is able to pick up a single word and output two (as > seperate terms), but it fails to pick up multiple words. > > Can anyone help? (incidentally when I use this approach on a SOLR text > field type it all works fine, but I cant use a SOLR text field type for this > as I use this field for facetting. > > > > If you wish to view the St. James's Place email disclaimer, please use the > link below > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Synonym Filtering on String Fields
I have the following field type set up in my schema. The idea is to fire phrases of text such as 'fund manager summary' (without the quotes) at it, and for the synonym processing to recognise this, and add the rest of the synonyms (index-time synonym processing with expansion) to the index from my synonym file (example below) in synonyms.txt. fund manager summary, fund manager report guide, product guide I run into 2 issues... (1) After analysis of the field in SOLR, I find that both fund manager summay fund manage report are NOT getting picked up by the synonym factory (after processing I just get the source term outputted from the synonym filter) (2) If I analyse guide, I do get product and guide (*2) outputted from the synonym filter factory - but as seperate terms (3 terms in total), I expected it to generate just 1 additional term - i.e. product guide It seems that it is able to pick up a single word and output two (as seperate terms), but it fails to pick up multiple words. Can anyone help? (incidentally when I use this approach on a SOLR text field type it all works fine, but I cant use a SOLR text field type for this as I use this field for facetting. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Synonym processing at index time
Good Morning - I will explain my current config/fucntionality. I have 4 fields in my index... 1) Doc Title - a text field 2) Keyword Phrase, e.g. fund manager, a text field (with some edge n gram functionality at index time) 3) Keyword Phrase, e.g. fund manager, a string field (for facetting) 4) Content Field, i.e. my full document text, a text field I have a nice bit of auto-complete functionality in my UI which works as follows... user searches -> fund ma and my service layer calls SOLR to say please find all docs with fund and ma in it. My search results are fine, I also ask for facets and counts in this same query so I can use them in my auto-complete (I ask for field (3) above when facetting). This allows me to use the facets and counts to show a nice auto-complete each time a user hits a key. Ok so far. I have a nice auto-complete based upon business domain Keyword Phrases. Now.on to synonyms, for example fund manager and fund lead are the same thing in my business domain. I was planning on simply adding the synonyms as normal entries into fields 2 and 3 (both multi-valued fileds) so that they would be inserted into the index and be available for my auto-complete. This would be OK and to clarify, nothing to do with the synonyms.txt file at this point. However, as SOLR has synonym processing I should take advantage of it (also at this point my synonym fund lead would not have found its way into field 4 (full text off the document) where fund manager was in the content). SO I belive I should so something like... fund manager, fund lead ...in my synonym file that I only want to process at index time (so it appears in my autocomplete) with expansion on. I want wherever fund manager or fund lead is found, for the index to have fund manager and fund lead. As I have expansion on and have multi word synonyms (phrases as both a source and target) then to use the synonym file at index time seems best. However, I am very confused at this point. I can see how the synonym file would be processed correctly for field 3 (a string field) and both terms fund maanger and fund lead should go into the index OK. But I can't see how it would work for the text fields (2 and 4). My Index time filter chain has synonym processing as per the default text field processing (after whitespace tokenisation), so I cant see how my terms fund manager and fund lead can be found by the synonym filter. I've looked in the book by Eric Pugh and they say that for multi-word synonyms to work you must use synonyms at index time and with expansion - they say you cant do synonym processing at query time as synonym phrases aren't recognised after whitespace parsing - but my index chain (and the defauly SOLR config for text fields ) also whitespace parses. it would be great to take advantage of synonym processing by SOLR instead of mty original plan - but am confused how multi-word synonms can be recognised at index time and added to the index - am I missing something about inde time processign of synonyms here? Many Thanks for any help/advice. Jason. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: Filter by relevance
I have a dismax query where I check for values in 3 fields against documents in the index - a title, a list of keyword tags and then full-text of the document. I usually get lots of results and I can see that the first results are OK - it's giving precedence to titles and tag matches, as my dismax boosts on title and keywords (normal boost and phrase boost). After say 20/30 good results I start to get matches based upon just the full-text, so these are less relevant. I am also facet.couting on my keyword tags (and presenting in the results as a way of filtering) and as you can imagine the counts are high because of the number of overall results. I want to somehow make the facet counts more associated with the higher relevancy results. My options as I see it are - 1) exclude full-text from the dismax altogether 2) configure the dismax normal boost on full-text to zero, but phrase boost to something higher (the aim here is to only really get a hit on the full-text if my search term is foound as a phrase in the full-text) 3) limit my results by relevancy or number of results If I do (3) above will the facet.counts respect the lower number of results - this is the overall aim really. Thank You Jason. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wed 03/11/2010 23:15 To: solr-user@lucene.apache.org Subject: Re: Filter by relevance Be aware, though, that relevance isn't absolute, it's only interesting #within# a query. And it's then normed between 0 and 1. So picking "a certain value" is rarely doing what you think it will. Limiting to the top N docs is usually more reasonable But this may be an XY problem. What is it you're trying to accomplish? Perhaps if you state the problem, some other suggestions may be in the offing Best Erick On Wed, Nov 3, 2010 at 4:48 PM, Jason Brown wrote: > Is it possible to filter my search results by relevance? For example, > anything below a certain value shouldn't be returned? > > I also retrieve facet counts in my search queries, so it would be useful if > the facet counts also respected the filter on the relevance. > > Thank You. > > Jason. > > If you wish to view the St. James's Place email disclaimer, please use the > link below > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
phrase boost on dismax query
I have 3 fields in my index that I use in a dismax query with boosts and phrase boosts. I've realised that 1 field I'm not really interested in at all, unless the search term is in that field as a phrase. Is it realistic to set the normal boost to zero for this field, but the phrase boost to soemthing much higher in order to achieve the desired effect? Thank You If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Filter by relevance
Is it possible to filter my search results by relevance? For example, anything below a certain value shouldn't be returned? I also retrieve facet counts in my search queries, so it would be useful if the facet counts also respected the filter on the relevance. Thank You. Jason. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
facet Prefix (or term prefix)
I am aware of the facet.prefix facility. I am using SOLR to return a facetted fields contents - I use the facet.prefix to restrict what returns from SOLR - this is very useful for predictive search functionality (autocomplete). My only issue is that the field I facet on is a string and could have 2 or 3 words in it, thus this process will only return strings that begin with what the user is typing into my UI search box. It would be useful if I could get facets back where I could match somewhere in the facetted field (not just at the begninning), i.e. is there a fact.contains method? If not I'll just have to code this in my service layer having received all facets from SOLR (without the prefix) Thanks for any help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: Dismax phrase boosts on multi-value fields
Thanks - I was hoping it wouldnt match - and I belive you've confimred it wont in my case as the default positionIncrementGap is set. Many Thanks Jason. -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Thu 21/10/2010 02:27 To: solr-user@lucene.apache.org Subject: RE: Dismax phrase boosts on multi-value fields Which is why the positionIncrementGap is set to a high number normally (100 in the sample schema.xml). With this being so, phrases won't match accross values in a multi-valued field. If for some reason you were using a dismax ps phrase slop that was higher than your positionIncrementGap, you could get phrase boost matches accross individual values. But normally that won't happen unless you do something odd to make it happen because you actually want it to, because positionIncrementGap is 100. If for some reason you wanted to use a phrase slop of over 100 but still make sure it didn't go accross individual value boundaries you could just set positionIncrementGap to something absurdly high (I'm not entirely sure why it isn't something absurdly high in the sample schema.xml, instead of the high-but-not-absurdly-so 100, since most people will probably expect individual values to be entirely seperate). Jason, are you _trying_ to make that happen, or hoping it won't? Ordinarily, it won't. From: Erick Erickson [erickerick...@gmail.com] Sent: Wednesday, October 20, 2010 7:11 PM To: solr-user@lucene.apache.org Subject: Re: Dismax phrase boosts on multi-value fields Well, it all depends (tm). your example wouldn't match, but if you didn't have an increment gap greater than 1, "black cat his blue" #would# match. Best Erick On Wed, Oct 20, 2010 at 3:22 AM, Jason Brown wrote: > Thanks Jonathan. > > To further clarify, I understand the the match of > > my blue rabbit > > would have to be found in 1 element (of my multi-valued defined field) for > the phrase boost on that field to kick in. > > If for example my document had the following 3 entries for the multi-value > field > > > my black cat > his blue car > her pink rabbit > > Then I assume the phrase boost would not kick-in as the search term (my > blue rabbit) isnt found in a single element (but can be found across them). > > Thanks again > > Jason. > > > > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Tue 19/10/2010 17:27 > To: solr-user@lucene.apache.org > Subject: Re: Dismax phrase boosts on multi-value fields > > > > You are correct. The query needs to match as a phrase. It doesn't need > to match "everything". Note that if a value is: > > "long sentence with my blue rabbit in it", > > then query "my blue rabbit" will also match as a phrase, for phrase > boosting or query purposes. > > Jonathan > > Jason Brown wrote: > > > > > > Hi - I have a multi-value field, so say for example it consists of > > > > 'my black cat' > > 'my white dog' > > 'my blue rabbit' > > > > The field is whitespace parsed when put into the index. > > > > I have a phrase query boost configured on this field which I understand > kicks in when my search term is found entirely in this field. > > > > So, if the search term is 'my blue rabbit', then I understand that my > phrase boost will be applied as this is found entirley in this field. > > > > My question/presumption is that as this is a multi-valued field, only 1 > value of the multi-value needs to match for the phrase query boost (given my > very imaginative set of test data :-) above, you can see that this obviously > matches 1 value and not them all) > > > > Thanks for your help. > > > > > > > > > > > > > > If you wish to view the St. James's Place email disclaimer, please use > the link below > > > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > > > > > > > > If you wish to view the St. James's Place email disclaimer, please use the > link below > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: Dismax phrase boosts on multi-value fields
Thanks Jonathan. To further clarify, I understand the the match of my blue rabbit would have to be found in 1 element (of my multi-valued defined field) for the phrase boost on that field to kick in. If for example my document had the following 3 entries for the multi-value field my black cat his blue car her pink rabbit Then I assume the phrase boost would not kick-in as the search term (my blue rabbit) isnt found in a single element (but can be found across them). Thanks again Jason. From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Tue 19/10/2010 17:27 To: solr-user@lucene.apache.org Subject: Re: Dismax phrase boosts on multi-value fields You are correct. The query needs to match as a phrase. It doesn't need to match "everything". Note that if a value is: "long sentence with my blue rabbit in it", then query "my blue rabbit" will also match as a phrase, for phrase boosting or query purposes. Jonathan Jason Brown wrote: > > > Hi - I have a multi-value field, so say for example it consists of > > 'my black cat' > 'my white dog' > 'my blue rabbit' > > The field is whitespace parsed when put into the index. > > I have a phrase query boost configured on this field which I understand kicks > in when my search term is found entirely in this field. > > So, if the search term is 'my blue rabbit', then I understand that my phrase > boost will be applied as this is found entirley in this field. > > My question/presumption is that as this is a multi-valued field, only 1 value > of the multi-value needs to match for the phrase query boost (given my very > imaginative set of test data :-) above, you can see that this obviously > matches 1 value and not them all) > > Thanks for your help. > > > > > > > If you wish to view the St. James's Place email disclaimer, please use the > link below > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Dismax phrase boosts on multi-value fields
Hi - I have a multi-value field, so say for example it consists of 'my black cat' 'my white dog' 'my blue rabbit' The field is whitespace parsed when put into the index. I have a phrase query boost configured on this field which I understand kicks in when my search term is found entirely in this field. So, if the search term is 'my blue rabbit', then I understand that my phrase boost will be applied as this is found entirley in this field. My question/presumption is that as this is a multi-valued field, only 1 value of the multi-value needs to match for the phrase query boost (given my very imaginative set of test data :-) above, you can see that this obviously matches 1 value and not them all) Thanks for your help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
FW: Dismax phrase boosts on multi-value fields
-Original Message- From: Jason Brown Sent: Tue 19/10/2010 13:45 To: d...@lucene.apache.org Subject: Dismax phrase boosts on multi-value fields Hi - I have a multi-value field, so say for example it consists of 'my black cat' 'my white dog' 'my blue rabbit' The field is whitespace parsed when put into the index. I have a phrase query boost configured on this field which I understand kicks in when my search term is found entirely in this field. So, if the search term is 'my blue rabbit', then I understand that my phrase boost will be applied as this is found entirley in this field. My question/presumption is that as this is a multi-valued field, only 1 value of the multi-value needs to match for the phrase query boost (given my very imaginative set of test data :-) above, you can see that this obviously matches 1 value and not them all) Thanks for your help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: What is the maximum number of documents that can be indexed ?
Not related to the opening thread - but wante to thank Eric for his book. Clarified a lot of stuff and very useful. -Original Message- From: Eric Pugh [mailto:ep...@opensourceconnections.com] Sent: Thu 14/10/2010 15:34 To: solr-user@lucene.apache.org Subject: Re: What is the maximum number of documents that can be indexed ? I would recommend looking at the work the HathiTrust has done. They have published some really great blog articles about the work they have done in scaling Solr, and have put in huge amounts of data. The good news is that there isn't a exact number, because "It depends". The bad news is that there isn't an exact number because "it depends"! Eric On Oct 13, 2010, at 8:58 PM, Otis Gospodnetic wrote: > Marco (use solr-u...@lucene list to follow up, please), > > There are no precise answers to such questions. Solr can keep indexing. The > limit is, I think, the available disk space. I've never pushed Solr or > Lucene > to the point where Lucene index segments would become a serious pain, but > even > that can be controlled. Same thing with number of open files, large file > support, etc. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > >> >> From: Marco Ciaramella >> To: d...@lucene.apache.org >> Sent: Wed, October 13, 2010 6:19:15 PM >> Subject: What is the maximum number of documents that can be indexed ? >> >> Hi all, >> I am working on a performance specification document on a Solr/Lucene-based >> application; this document is intended for the final customer. My question >> is: >> what is the maximum number of document I can index assuming 10 or 20kbytes >> for >> each document? >> >> >> I could not find a precise answer to this question, and I tend to consider >> that >> Solr index can be virtually limited only by the JVM, the Operating System >> (limits to large file support), or by hardware constraints (mainly RAM, etc. >> ... >> ). >> >> >> Thanks >> Marco >> >> >> - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: multi level faceting
Yes, by adding fq back into the main query you will get results increasingly filtered each time. You may run into an issue if you are displaying facet counts, as the facet part of the query will also obey the increasingly filtered fq, and so not display counts for other categories anymore from the chosen facet (depends if you need to display counts from a facet once the first value from the facet has been chosen if you get my drift). Local params are a way to deal with this by not subjecting the facet count to the same fq restriction (but allowing the search results to obey it). -Original Message- From: Nguyen, Vincent (CDC/OD/OADS) (CTR) [mailto:v...@cdc.gov] Sent: Mon 04/10/2010 16:34 To: solr-user@lucene.apache.org Subject: RE: multi level faceting Ok. Thanks for the quick response. Vincent Vu Nguyen Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-6154 Century Bldg 2400 Atlanta, GA 30329 -Original Message- From: Allistair Crossley [mailto:a...@roxxor.co.uk] Sent: Monday, October 04, 2010 9:40 AM To: solr-user@lucene.apache.org Subject: Re: multi level faceting I think that is just sending 2 fq facet queries through. In Solr PHP I would do that with, e.g. $params['facet'] = true; $params['facet.fields'] = array('Size'); $params['fq'] => array('sex' => array('Men', 'Women')); but yes i think you'd have to send through what the current facet query is and add it to your next drill-down On Oct 4, 2010, at 9:36 AM, Nguyen, Vincent (CDC/OD/OADS) (CTR) wrote: > Hi, > > > > I was wondering if there's a way to display facet options based on > previous facet values. For example, I've seen many shopping sites where > a user can facet by "Mens" or "Womens" apparel, then be shown "sizes" to > facet by (for Men or Women only - whichever they chose). > > > > Is this something that would have to be handled at the application > level? > > > > Vincent Vu Nguyen > > > > > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Facet Counts Issue when Using Dismax Query Parser in SOLR
I am retrieving facet counts against a specific column in my index and these look accurate. The query for retrieving these counts is also running a dismax search using the q param (against 4 columns in my index, 1 of which I am facet counting on as mentioned above). So far, so good. I show my search results, I show my facets and associated counts. However, I want the user to be able to 'drill-down' by re-running the same search (same q param), but adding in one of the facets to filter the results. Clearly, I can't modify the q parameter to filter against my facetted column (in addition to the previous q value), as dismax wont allow a q param to have a column specified. So I add a fq param to filter the results by the chosen facet. This seems logical, but the number of search results I get is NOT the same as the count against the facet. I thought that by adding an fq param I am basically saying (ensuring I keep the q param the same), re-run the search but filter my results where my facetted column has value 'x'. However as the number of results is not what I am expecting, I believe it may be using the fq param first to define the number of docs against which the q param is subsequently used. But this doesnt seem very intuitive. But it would explain the difference in the facet count and subsequent number of search results that I am observing. Could someone help point out which of the 2 interpreations of fq is correct? If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer