Re: Shared Stored Field

2014-04-11 Thread StrW_dev
Erick Erickson wrote
 So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in
 another doc etc?

Well yes that could work, but this would mean we get a lot of unique dymanic
fields, basically equal to the number of documents in our system and I am
not sure if that is a good practice.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Shared Stored Field

2014-04-10 Thread StrW_dev
Hello,

We have a denormalized index where certain documents point in essence to the
same content. 
The relevance of the documents depends on the current context. E.g. document
A has a different boost factor when we apply filter F1 compared to when we
use filter F2 (or F3, etc).
To support this we denormalize document A with a unique boost field, such
that for each filter he can be found in he has a different relevance.

The problem is that the documents have a big stored content that is required
for the highlighting snippets.

This denormalization grows the index size with a factor 100 in worse case.
Storing the same big content field a lot of times times seems really
inefficient. 
Is there a way to point a group of documents to the same stored content
fields? 
Or is there a different way to influence the relevance depending on the
current search context?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shared Stored Field

2014-04-10 Thread Erick Erickson
Hmmm, I scanned your question, so maybe I missed something. It sounds
like you have a fixed number of filters known at index time, right? So
why not index these boosts in separate fields in the document (e.g.
f1_boost, f2_boost etc) and use a function query
(https://cwiki.apache.org/confluence/display/solr/Function+Queries) at
query time to boost by the correct one?

Of course I may be wy off base here, but

BTW, you could use dynamic fields to not have to pre-define the
maximum number of boost fields, something like this in my example:
dynamicField name=*_boost  type=float  indexed=true  stored=false/

Best
Erick

On Thu, Apr 10, 2014 at 4:30 AM, StrW_dev r.j.bamb...@structweb.nl wrote:
 Hello,

 We have a denormalized index where certain documents point in essence to the
 same content.
 The relevance of the documents depends on the current context. E.g. document
 A has a different boost factor when we apply filter F1 compared to when we
 use filter F2 (or F3, etc).
 To support this we denormalize document A with a unique boost field, such
 that for each filter he can be found in he has a different relevance.

 The problem is that the documents have a big stored content that is required
 for the highlighting snippets.

 This denormalization grows the index size with a factor 100 in worse case.
 Storing the same big content field a lot of times times seems really
 inefficient.
 Is there a way to point a group of documents to the same stored content
 fields?
 Or is there a different way to influence the relevance depending on the
 current search context?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shared Stored Field

2014-04-10 Thread StrW_dev
Erick Erickson wrote
  
 So why not index these boosts in separate fields in the document (e.g.
 f1_boost, f2_boost etc) and use a function query
 (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at
 query time to boost by the correct one?

Well its basically one multivalued field that can have unlimited values and
has multiple per document (on average like 8). In that case we should add a
boost field for each of the values in the document, in general we would get
unlimited amount of dynamic fields in the index. 

But it is possible to select a different boost field depending on the
current filter query? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shared Stored Field

2014-04-10 Thread Erick Erickson
bq: But it is possible to select a different boost field depending on
the current filter query?

Well, you're constructing the URL somewhere, you can choose the right
boost there can't you?

I don't understand this bit:
Well its basically one multivalued field that can have unlimited
values and has multiple per document (on average like 8)

The _values_ aren't at issue, it's just the name of the field. You can
have lots of dynamic fields defined in your documents and it's not too
expensive. Don't go wild here, when you get up into the hundreds maybe
you should think about it a bit.

I feel I'm missing something, some concrete examples would help a lot.

Best,
Erick

On Thu, Apr 10, 2014 at 7:33 AM, StrW_dev r.j.bamb...@structweb.nl wrote:
 Erick Erickson wrote

 So why not index these boosts in separate fields in the document (e.g.
 f1_boost, f2_boost etc) and use a function query
 (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at
 query time to boost by the correct one?

 Well its basically one multivalued field that can have unlimited values and
 has multiple per document (on average like 8). In that case we should add a
 boost field for each of the values in the document, in general we would get
 unlimited amount of dynamic fields in the index.

 But it is possible to select a different boost field depending on the
 current filter query?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shared Stored Field

2014-04-10 Thread StrW_dev
Erick Erickson wrote
 Well, you're constructing the URL somewhere, you can choose the right 
 boost there can't you? 

Yes of course!

As example:
We have one filter field called FILTER which can have unlimited values acros
all documents.
Each document as on average 8 values set for FILTER (e.g. FILTER
[1,2,..,8]).
So we could add boost fields depending on each of these values as B_1:1.0,
...  ,B_7:5.0 for example and use that during query time. This is your
suggestions correct?

So each document has on average 8 of these dynamic fields, while over the
whole index we have unlimited of these fields. What would this mean for the
performance?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shared Stored Field

2014-04-10 Thread Erick Erickson
So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in
another doc etc?

What's so confusing is that in your first e-mail, you said:
bq: This denormalization grows the index size with a factor 100 in worse case.

Which I took to mean you have at most 100 of these fields.

Please look at the function query page I referenced and try a few
things so we can deal with specific questions. You can put the results
of a _query_ in a function query, so you could probably just form a
sub-query that returns a score that you in turn use to boost the doc.

Best,
Erick

On Thu, Apr 10, 2014 at 8:04 AM, StrW_dev r.j.bamb...@structweb.nl wrote:
 Erick Erickson wrote
 Well, you're constructing the URL somewhere, you can choose the right
 boost there can't you?

 Yes of course!

 As example:
 We have one filter field called FILTER which can have unlimited values acros
 all documents.
 Each document as on average 8 values set for FILTER (e.g. FILTER
 [1,2,..,8]).
 So we could add boost fields depending on each of these values as B_1:1.0,
 ...  ,B_7:5.0 for example and use that during query time. This is your
 suggestions correct?

 So each document has on average 8 of these dynamic fields, while over the
 whole index we have unlimited of these fields. What would this mean for the
 performance?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html
 Sent from the Solr - User mailing list archive at Nabble.com.