Re: Vetoes and [jira] [Commented] (MAHOUT-881) Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting

Grant Ingersoll Sat, 12 Nov 2011 15:06:31 -0800


On Nov 12, 2011, at 12:58 PM, Sean Owen wrote:

> On Sat, Nov 12, 2011 at 7:25 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>> OK, that's reasonable.  I did see the reply and my response was to put up a 
>> patch so we can discuss it concretely instead of have a theoretical 
>> discussion.
> 
> Your JIRA just repeated the same point, and appeared to disregard my
> follow-up. It looks like the stated motivation for the patch was in
> fact not correct. (Yonik provided a different, decent rationalization
> for it.) I think it was right to -1 the patch. and I don't think you
> should have suggested this was an invalid or extreme thing to do.
> Opening a JIRA, which suggests you think there's consensus to make a
> change, seems like the out-of-place thing.

Yes, you are right.  I overreacted and should have explained myself better.  My 
apologies.

At this point, I've seen a variety of results in benchmarking on my laptop, 
some suggesting a difference of nothing (favoring your point of view) to about 
9% (which I think we would like to have!)   Since I don't have faith in these 
tests on my laptop, I am going to re-run when I get home so I have access to my 
benchmark server.  I'll put up the patch containing the benchmarking code so 
others can scrutinize it.  Longer term, I'd like to bring over Lucene's 
benchmark framework, modified for our needs.

> 
> 
>> These 40 lines could probably be a lot less if we didn't have 2 different 
>> classes to track the same two things:  id and score (SimilarUser and 
>> RecommendedItem).  If that were gone, then we could just have one extension 
>> of the PQ.  Likewise for ItemItemSimilarity and UserUserSimilarity 
>> container.  Since that's all hidden from the user, there really is no need 
>> for the distinction.
> 
> This is an orthogonal point -- what's it have to do with your proposed
> patch? I think the patch is fine, but don't think this bolsters the
> argument that it's a big simplification.
> 

I was just responding to the suggestion that the patch was adding a lot more 
lines of code, but I don't think it would if it weren't for the redundancy in 
storage classes.

> You can overload, say, SimilarUser and GenericRecommendedItem. They're
> both an ID and a floating-point value. What do you call it --
> EntityScore? It probably harms readability ever so slightly,
> especially as it also has to implement the RecommendedItem interface.
> toString() would change. I don't object, I don't really think it
> helps.

Perhaps, but they are internal objects, right?

Re: Vetoes and [jira] [Commented] (MAHOUT-881) Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting

Reply via email to