>>Instead of making other APIs to accomodate BloomFilter's current >>brokenness: remove its custom per-field logic so it works with >>PerFieldPostingsFormat, like every other PF.
Not looked at it in a while but I'm pretty certain, like every other PF, you can go ahead and use PerFieldPF with Bloom filter just fine. What was broken was (is?) that in this configuration PFPF isn't smart enough to avoid creating twice as many files as is required - see Lucene 4093. Until that is resolved (and I have noted my pessimism about that being fixed easily) BloomPF contains an optimisation for those that want to avoid this inefficiency. The use of that optimisation is entirely optional for users. Internally to BloomPF, the implementation of that optimisation is trivial - if a null bloom set is returned for a given field it ignores the usual bloom filtering logic and delegates directly to the wrapped codec. You can choose to implement a BloomFilterFactory that adds this field-choice optimisation or, more simply run the default PerFieldPF-managed configuration and live with the increased numbers of files. Arguably, the inefficiencies of the PerFieldPF framework are the real issue to be addressed here. >>I brought this up before it was committed, and i was ignored You stopped engaging in the debate when I outlined the 3 proposed options for moving BloomPF forward : http://goo.gl/mxtP9 Those options were: 1) ignore the inefficiencies in PFPF 2) sort out the issues in PFPF (4093 but probably a more complex solution) 3) work around existing PFPF issues with a simple but entirely optional optimisation to BloomPF I opted for 3) and gave notice that I 'd take it out if anyone objected. I don't think there's been any movement on 2) so I guess you're still happy with option 1)? I recall you didn't think the business of extra files was that much of a concern: http://goo.gl/eJWo3 (Incidentally, probably best following up on the relevant Jiras rather than here) Cheers Mark ________________________________ From: Robert Muir <rcm...@gmail.com> To: dev@lucene.apache.org Sent: Wednesday, 13 February 2013, 13:01 Subject: Re: New Lucene features and Solr indexes On Wed, Feb 13, 2013 at 4:42 AM, Adrien Grand <jpou...@gmail.com> wrote: > Hi Shawn, > > On Tue, Feb 12, 2013 at 8:58 PM, Shawn Heisey <s...@elyograg.org> wrote: >> Some of these, like compressed stored fields and compressed termvectors, are >> being turned on by default, which is awesome. I'm already running a 4.2 >> snapshot, so I've got those in place. > > Excellent! > >> One thing that I know I would like to do is use the new BloomFilter for a >> couple of my fields that contain only unique values. Last time I checked >> (which was before the 4.1 release), if you added the lucene-codecs jar, Solr >> had a BloomFilter postings format, but didn't have any way to specify the >> underlying format. See SOLR-3950 and LUCENE-4394. > > BloomFilterPostingsFormat is a little special compared to other > postings formats because it can wrap any postings format. So maybe it > should require special support, like an additional attribute in the > field type definition? -1 Instead of making other APIs to accomodate BloomFilter's current brokenness: remove its custom per-field logic so it works with PerFieldPostingsFormat, like every other PF. In other words, it should work just like pulsing. I brought this up before it was committed, and i was ignored. Thats fine, but I'll be damned if i let its incorrect design complicate other parts of the codebase too. I'd rather it continue to stay difficult to integrate and continue walking its current path to an open source death instead. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org