Hi Shawn, On Tue, Feb 12, 2013 at 8:58 PM, Shawn Heisey <[email protected]> wrote: > Some of these, like compressed stored fields and compressed termvectors, are > being turned on by default, which is awesome. I'm already running a 4.2 > snapshot, so I've got those in place.
Excellent! > One thing that I know I would like to do is use the new BloomFilter for a > couple of my fields that contain only unique values. Last time I checked > (which was before the 4.1 release), if you added the lucene-codecs jar, Solr > had a BloomFilter postings format, but didn't have any way to specify the > underlying format. See SOLR-3950 and LUCENE-4394. BloomFilterPostingsFormat is a little special compared to other postings formats because it can wrap any postings format. So maybe it should require special support, like an additional attribute in the field type definition? > Another new feature that is coming soon to Solr is DocValues - SOLR-3855. > Looking at the issue, I was not able to tell what situations would be > appropriate for using the feature. Doc values are like FieldCache except that you don't need to uninvert values from the inverted index whenever you open a new Reader. I think there are two reasons why you would like to turn doc values on: - if you are indexing a field only for faceting, sorting or grouping (not searching), setting indexed=false and docValues=true will provide the same functionnality and be lighter, both at indexing time (no need to invert the field) and when opening a new IndexReader (no need to uninvert the field), - if the field is also used for searching, turning doc values on will give your Lucene index a little more work at indexing time (not a big deal in my opinion) but it will be faster to open (especially interesting if you're doing near-realtime search) and likely more memory-efficient. However doc values are useless for searching, so there is no need to turn them on on a field which is used solely for searching. Similarly to stored fields, doc values could help you retrieve the value of a field, but the trade-off is very different: stored fields are better at retrieving many fields of a single document efficiently while doc values are good at retrieving one field for a lot of documents efficiently. So if you want to get a field's value in the response, you should keep setting stored=true. There might be optimizations in the future for example if you're only asking for a single field which has doc values, but this will be transparent to you. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
