Re: Vetoes and [jira] [Commented] (MAHOUT-881) Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting

Grant Ingersoll Sat, 12 Nov 2011 11:25:48 -0800

On Nov 12, 2011, at 8:28 AM, Sean Owen wrote:

> I was surprised to see a JIRA pop up when we had an open thread on the
> topic on the mailing list. Did you see my reply? I believe it had
> answered your questions... then you seemed to be about to make a
> change anyway. I don't think anything is "serious" here, but I suppose
> that's what I was flagging with a -1. Indulge me below first please.


OK, that's reasonable.  I did see the reply and my response was to put up a 
patch so we can discuss it concretely instead of have a theoretical discussion.

> 
> I am sure this can easily be agreed on. I detest classic Apache-style
> JIRA fights, I'm not picking any such thing!
> 
> 
>> Just b/c some piece of code isn't considered a bottleneck at the moment 
>> means it can't be improved?  This new code removes a fair bit of complexity 
>> (the code was passing over the same data at least 3 times in some cases, 
>> when it only needed to pass over it once) and puts in place an approach that 
>> is far fewer operations AND has a faster underlying data structure.  It's a 
>> no-brainer.  I would think we would be happy that we are finally to the 
>> point where doing these kinds of optimizations is a worthwhile exercise.
> 
> No disagreement about optimizing and improving stuff even for its own
> sake. The question is whether this is an improvement and I'm not sure
> I agree with that reasoning (see next). This adds a net 40 lines of
> code -- I am not sure that's a simplification? It's no big deal
> either, but I suppose I am not seeing that win.

These 40 lines could probably be a lot less if we didn't have 2 different 
classes to track the same two things:  id and score (SimilarUser and 
RecommendedItem).  If that were gone, then we could just have one extension of 
the PQ.  Likewise for ItemItemSimilarity and UserUserSimilarity container.  
Since that's all hidden from the user, there really is no need for the 
distinction.

Re: Vetoes and [jira] [Commented] (MAHOUT-881) Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting

Reply via email to