Re: New Lucene features and Solr indexes

mark harwood Wed, 13 Feb 2013 07:03:59 -0800

>>Instead of making other APIs to accomodate BloomFilter's current
>>brokenness: remove its custom per-field logic so it works with
>>PerFieldPostingsFormat, like every other PF.

Not looked at it in a while but I'm pretty certain, like every other PF, you 
can go ahead and use PerFieldPF with Bloom filter just fine.

What was broken was (is?) that in this configuration PFPF isn't smart enough to 
avoid creating twice as many files as is required - see Lucene 4093.
Until that is resolved (and I have noted my pessimism about that being fixed 
easily) BloomPF contains an optimisation for those that want to avoid this 
inefficiency.
The use of that optimisation is entirely optional for users.
Internally to BloomPF, the implementation of that optimisation is trivial  - if 
a null bloom set is returned for a given field it ignores the usual bloom 
filtering logic and delegates directly to the wrapped codec. 
You can choose to implement a BloomFilterFactory that adds this field-choice 
optimisation or, more simply run the default PerFieldPF-managed configuration 
and live with the increased numbers of files.

Arguably, the inefficiencies of the PerFieldPF framework are the real issue to 
be addressed here.

>>I brought this up before it was committed, and i was ignored

You stopped engaging in the debate when I outlined the 3 proposed options for 
moving BloomPF forward :  http://goo.gl/mxtP9
Those options were:
1) ignore the inefficiencies in PFPF
2) sort out the issues in PFPF (4093 but probably a more complex solution)
3) work around existing PFPF issues with a simple but entirely optional 
optimisation to BloomPF

I opted for 3) and gave notice that I 'd take it out if anyone objected. 
I don't think there's been any movement on 2) so I guess you're still happy 
with option 1)? I recall you didn't think the business of extra files was that 
much of a concern: http://goo.gl/eJWo3

(Incidentally, probably best following up on the relevant Jiras rather than 
here)

Cheers
Mark

________________________________
 From: Robert Muir <rcm...@gmail.com>
To: dev@lucene.apache.org 
Sent: Wednesday, 13 February 2013, 13:01
Subject: Re: New Lucene features and Solr indexes

On Wed, Feb 13, 2013 at 4:42 AM, Adrien Grand <jpou...@gmail.com> wrote:
> Hi Shawn,
>
> On Tue, Feb 12, 2013 at 8:58 PM, Shawn Heisey <s...@elyograg.org> wrote:
>> Some of these, like compressed stored fields and compressed termvectors, are
>> being turned on by default, which is awesome.  I'm already running a 4.2
>> snapshot, so I've got those in place.
>
> Excellent!
>
>> One thing that I know I would like to do is use the new BloomFilter for a
>> couple of my fields that contain only unique values.  Last time I checked
>> (which was before the 4.1 release), if you added the lucene-codecs jar, Solr
>> had a BloomFilter postings format, but didn't have any way to specify the
>> underlying format.  See SOLR-3950 and LUCENE-4394.
>
> BloomFilterPostingsFormat is a little special compared to other
> postings formats because it can wrap any postings format. So maybe it
> should require special support, like an additional attribute in the
> field type definition?

-1

Instead of making other APIs to accomodate BloomFilter's current
brokenness: remove its custom per-field logic so it works with
PerFieldPostingsFormat, like every other PF.

In other words, it should work just like pulsing.

I brought this up before it was committed, and i was ignored. Thats
fine, but I'll be damned if i let its incorrect design complicate
other parts of the codebase too. I'd rather it continue to stay
difficult to integrate and continue walking its current path to an
open source death instead.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: New Lucene features and Solr indexes

Reply via email to