Re: Shared Stored Field
Erick Erickson wrote So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in another doc etc? Well yes that could work, but this would mean we get a lot of unique dymanic fields, basically equal to the number of documents in our system and I am not sure if that is a good practice. -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130589.html Sent from the Solr - User mailing list archive at Nabble.com.
Shared Stored Field
Hello, We have a denormalized index where certain documents point in essence to the same content. The relevance of the documents depends on the current context. E.g. document A has a different boost factor when we apply filter F1 compared to when we use filter F2 (or F3, etc). To support this we denormalize document A with a unique boost field, such that for each filter he can be found in he has a different relevance. The problem is that the documents have a big stored content that is required for the highlighting snippets. This denormalization grows the index size with a factor 100 in worse case. Storing the same big content field a lot of times times seems really inefficient. Is there a way to point a group of documents to the same stored content fields? Or is there a different way to influence the relevance depending on the current search context? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shared Stored Field
Hmmm, I scanned your question, so maybe I missed something. It sounds like you have a fixed number of filters known at index time, right? So why not index these boosts in separate fields in the document (e.g. f1_boost, f2_boost etc) and use a function query (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at query time to boost by the correct one? Of course I may be wy off base here, but BTW, you could use dynamic fields to not have to pre-define the maximum number of boost fields, something like this in my example: dynamicField name=*_boost type=float indexed=true stored=false/ Best Erick On Thu, Apr 10, 2014 at 4:30 AM, StrW_dev r.j.bamb...@structweb.nl wrote: Hello, We have a denormalized index where certain documents point in essence to the same content. The relevance of the documents depends on the current context. E.g. document A has a different boost factor when we apply filter F1 compared to when we use filter F2 (or F3, etc). To support this we denormalize document A with a unique boost field, such that for each filter he can be found in he has a different relevance. The problem is that the documents have a big stored content that is required for the highlighting snippets. This denormalization grows the index size with a factor 100 in worse case. Storing the same big content field a lot of times times seems really inefficient. Is there a way to point a group of documents to the same stored content fields? Or is there a different way to influence the relevance depending on the current search context? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shared Stored Field
Erick Erickson wrote So why not index these boosts in separate fields in the document (e.g. f1_boost, f2_boost etc) and use a function query (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at query time to boost by the correct one? Well its basically one multivalued field that can have unlimited values and has multiple per document (on average like 8). In that case we should add a boost field for each of the values in the document, in general we would get unlimited amount of dynamic fields in the index. But it is possible to select a different boost field depending on the current filter query? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shared Stored Field
bq: But it is possible to select a different boost field depending on the current filter query? Well, you're constructing the URL somewhere, you can choose the right boost there can't you? I don't understand this bit: Well its basically one multivalued field that can have unlimited values and has multiple per document (on average like 8) The _values_ aren't at issue, it's just the name of the field. You can have lots of dynamic fields defined in your documents and it's not too expensive. Don't go wild here, when you get up into the hundreds maybe you should think about it a bit. I feel I'm missing something, some concrete examples would help a lot. Best, Erick On Thu, Apr 10, 2014 at 7:33 AM, StrW_dev r.j.bamb...@structweb.nl wrote: Erick Erickson wrote So why not index these boosts in separate fields in the document (e.g. f1_boost, f2_boost etc) and use a function query (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at query time to boost by the correct one? Well its basically one multivalued field that can have unlimited values and has multiple per document (on average like 8). In that case we should add a boost field for each of the values in the document, in general we would get unlimited amount of dynamic fields in the index. But it is possible to select a different boost field depending on the current filter query? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shared Stored Field
Erick Erickson wrote Well, you're constructing the URL somewhere, you can choose the right boost there can't you? Yes of course! As example: We have one filter field called FILTER which can have unlimited values acros all documents. Each document as on average 8 values set for FILTER (e.g. FILTER [1,2,..,8]). So we could add boost fields depending on each of these values as B_1:1.0, ... ,B_7:5.0 for example and use that during query time. This is your suggestions correct? So each document has on average 8 of these dynamic fields, while over the whole index we have unlimited of these fields. What would this mean for the performance? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shared Stored Field
So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in another doc etc? What's so confusing is that in your first e-mail, you said: bq: This denormalization grows the index size with a factor 100 in worse case. Which I took to mean you have at most 100 of these fields. Please look at the function query page I referenced and try a few things so we can deal with specific questions. You can put the results of a _query_ in a function query, so you could probably just form a sub-query that returns a score that you in turn use to boost the doc. Best, Erick On Thu, Apr 10, 2014 at 8:04 AM, StrW_dev r.j.bamb...@structweb.nl wrote: Erick Erickson wrote Well, you're constructing the URL somewhere, you can choose the right boost there can't you? Yes of course! As example: We have one filter field called FILTER which can have unlimited values acros all documents. Each document as on average 8 values set for FILTER (e.g. FILTER [1,2,..,8]). So we could add boost fields depending on each of these values as B_1:1.0, ... ,B_7:5.0 for example and use that during query time. This is your suggestions correct? So each document has on average 8 of these dynamic fields, while over the whole index we have unlimited of these fields. What would this mean for the performance? -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html Sent from the Solr - User mailing list archive at Nabble.com.