My main goal for reworking the file nomenclature was to make the various
clustering file names follow a consistent naming convention. I don't
expect that to change again any time soon but I noticed that some of the
examples need to be updated to work with trunk (0.4).
On 4/23/10 11:11 AM, Robin Anil wrote:
If you are making more changes do that, you are more than welcome to. Just
fix a convention. For example, in the clustering algorithms chapter, it was
points and clusters-[0-n] like you said. and in dirichlet it was state-n. So
it will be better if we stick to a single convention and the book will
follow(shouldn't be the other way around)
Robin
On Fri, Apr 23, 2010 at 11:30 PM, Jeff Eastman
<[email protected]>wrote:
The APIs did not change but the clustered points directory changed from
"points" to "clusteredPoints" and the various clusters directories changed
from (e.g. canopies, clusters, clusters-n, canopies-n, state-n) to just
clusters-n, where clusters-0 is used for the initial clusters needed for
kmeans and is produced by canopy output by default.
On 4/23/10 10:25 AM, Robin Anil wrote:
Its not aimed at 0.3 per say. Right now its evolving with the code. For.
eg.
the quality factor is something that will go in there. I keep updating the
code with the latest changes and so does Sean. There isnt much that got
affected by your latest commit though(it compiles). Though I haven't fully
tested the code with the dataset after the commit, something I plan to do
soon.
Robin
On Fri, Apr 23, 2010 at 9:51 PM, Jeff Eastman<[email protected]
wrote:
I also wonder how much my recent clustering changes have affected the
examples in the clustering sections. I know the book is currently aimed
at
Mahout 0.3 but users trying the examples with trunk may be frustrated by
the
recent changes in file naming. Do the examples exist in an unannotated
version somewhere that I could get working again on trunk?
On 4/23/10 9:10 AM, Sean Owen wrote:
Good eye, this was fixed in the manuscript a while ago.
I will ping Manning to re-publish Chapters 1-6 since a lot of small
updates have happened since then.
On Fri, Apr 23, 2010 at 4:53 PM, Jeff Eastman
<[email protected]> wrote:
Section 4.5.1 says:
"The third line shows how it is based on item-item similarities, not
user-user similarities as before. The algorithms are similar, but not
entirely symmetric. They do have notably different properties. For
instance,
the running time of an item-based recommender scales up as the number
of
items increases, whereas a user-based recommender’s running time goes
up
as
the number of users increases.
This suggests one reason that you might choose an item-based
recommender:
if
the number of users is relatively low compared to the number of items,
the
performance advantage could be significant."
Shouldn't the second paragraph be?
"This suggests one reason that you might choose an item-based
recommender:
if the number of users is relatively *high* compared to the number of
items,
the performance advantage could be significant."