I used Log Likelihood Similarity and Euclidean distance. My input file is string
CustomerNo,Part No TR433;SPTBY-1711 TR433;SPTBL-1711 TR433;SPTKP-1711 TR746;TDTBY-861 TR746;TDTBL-861 TR746;TDTKP-861 and Converted using MemoryIDMigrator to long values like 1903325046098094985,5192157078505275458,-3162216497309240828 2276278324672472631,496035984324855953,-3162216497309240828 2276278324672472631,2666580089560192147,-3162216497309240828 2276278324672472631,-3436879215117796241,-3162216497309240828 7260913912542566719,8688228931167592947,-3162216497309240828 7260913912542566719,5860894063367472580,-3162216497309240828 When i used Euclidean distance there is no recommendation, but Log likelihood Based Item Similarity gives me results which seems very good. So, If I use string based input data for recommendation, do I have to use "Log likelihood Based Item Similarity"? Thanks Ozgur CATAK Ph.D. Student Istanbul University, Informatics On Fri, Dec 11, 2009 at 12:13 PM, Sean Owen <[email protected]> wrote: > You probably want a user-based recommender since you have very few > users, relatively. Performance should not be a problem given the size > of your input -- probably can compute recommendations in tens of > milliseconds. > > You will need to use RecommenderEvaluator to find which of many > possible implementations produces the best results on your input. For > example, experiment with a nearest-n user neighborhood with small > values of n, and try Euclidean distance-based and log-likelihood-based > similarity metrics. Try several variations and see which produces the > lowest evaluation score. > > On Fri, Dec 11, 2009 at 6:43 AM, F.Ozgur Catak <[email protected]> > wrote: > > approx. 100.000 rows and 2000 users > > > > On Fri, Dec 11, 2009 at 2:25 AM, Sean Owen <[email protected]> wrote: > > > >> The best algorithm really depends on your data. > >> > >> How many items and how many users do you have? that will determine > >> which algorithms will perform better. > >> > >> Which algorithms will produce the best recommendations is hard to > >> tell. Usually you have to use RecommenderEvaluator with lots of > >> implementations and your data to find which seems to work best. > >> > >> if you can say more about your data, maybe I can guess about the best > >> implementations to try. > >> > >> On Thu, Dec 10, 2009 at 9:56 PM, F.Ozgur Catak <[email protected] > > > >> wrote: > >> > Hi again, > >> > > >> > Finally I understand the item similarity :). In our b2b project we > need > >> to > >> > develop a recommendation system. I want to use mahout. Is there any > best > >> > practice. And also another question, is mahout enogh mature to use our > >> > production enviroment. > >> > > >> > thanks > >> > > >> > On Thu, Dec 10, 2009 at 9:31 PM, Sean Owen <[email protected]> wrote: > >> > > >> >> No, the similarity metric is passed in as an ItemSimilarity metric. > >> >> There is no implementation based on a model, if that's what you mean. > >> >> What else? > >> >> > >> >> On Thu, Dec 10, 2009 at 7:27 PM, F.Ozgur Catak < > [email protected] > >> > > >> >> wrote: > >> >> > Yes, I read the javadoc but i need the algorithms. For example, > does > >> >> > recommandation system uses apriori algorithm to find similar > values? > >> etc. > >> >> > > >> >> > Maybe it is mine problem, because I'm also a newbi about data > mining. > >> >> > > >> >> > Thanks > >> >> > > >> >> > >> > > >> > > >
