Re: Mahout In Action

Jeff Eastman Fri, 23 Apr 2010 11:01:20 -0700

The APIs did not change but the clustered points directory changed from"points" to "clusteredPoints" and the various clusters directorieschanged from (e.g. canopies, clusters, clusters-n, canopies-n, state-n)to just clusters-n, where clusters-0 is used for the initial clustersneeded for kmeans and is produced by canopy output by default.


On 4/23/10 10:25 AM, Robin Anil wrote:

Its not aimed at 0.3 per say. Right now its evolving with the code. For. eg.
the quality factor is something that will go in there. I keep updating the
code with the latest changes and so does Sean. There isnt much that got
affected by your latest commit though(it compiles). Though I haven't fully
tested the code with the dataset after the commit, something I plan to do
soon.


Robin

On Fri, Apr 23, 2010 at 9:51 PM, Jeff Eastman<[email protected]>wrote:

I also wonder how much my recent clustering changes have affected the
examples in the clustering sections. I know the book is currently aimed at
Mahout 0.3 but users trying the examples with trunk may be frustrated by the
recent changes in file naming. Do the examples exist in an unannotated
version somewhere that I could get working again on trunk?

On 4/23/10 9:10 AM, Sean Owen wrote:

Good eye, this was fixed in the manuscript a while ago.

I will ping Manning to re-publish Chapters 1-6 since a lot of small
updates have happened since then.

On Fri, Apr 23, 2010 at 4:53 PM, Jeff Eastman
<[email protected]>   wrote:

Section 4.5.1 says:
"The third line shows how it is based on item-item similarities, not
user-user similarities as before. The algorithms are similar, but not
entirely symmetric. They do have notably different properties. For
instance,
the running time of an item-based recommender scales up as the number of
items increases, whereas a user-based recommender’s running time goes up
as
the number of users increases.

This suggests one reason that you might choose an item-based recommender:
if
the number of users is relatively low compared to the number of items,
the
performance advantage could be significant."

Shouldn't the second paragraph be?

"This suggests one reason that you might choose an item-based
recommender:
if the number of users is relatively *high* compared to the number of
items,
the performance advantage could be significant."

Re: Mahout In Action

Reply via email to