I tried t1=80 and t2=55 (same as the numbers specified for synthetic data). Would you like me to upload the 200/500/1000 document vectors? That's where performance drops non-linearly.
--shashi On Tue, May 12, 2009 at 5:55 PM, Grant Ingersoll <[email protected]> wrote: > Yep, saw that. Still would be good to see if there is a way to improve it, > even for low values. Since we are in the early stages of Mahout, it will be > really important to develop recommendations, etc. on values for things like > t1 and t2, so any info we can bring to bear on that will be helpful. > > That being said, it should be easy enough to reproduce based on your > description. What were the values for t1 and t2 you tried? > > -Grant > > On May 12, 2009, at 7:07 AM, Shashikant Kore wrote: > >> Grant, >> >> I was using low values for t1 and t2. Increasing these values solves >> the current problem. Now the problem is to find out optimum values for >> t1 and t2 for given data set. Please check my previous message on >> this thread for details. >> >> Thanks, >> --shashi >> >> On Tue, May 12, 2009 at 4:26 PM, Grant Ingersoll <[email protected]> >> wrote: >>> >>> Is it possible to share the code and the 100 docs? If not, can you >>> reproduce with synthetic data? >>> >>> -Grant >>> >>> On May 11, 2009, at 9:38 AM, Shashikant Kore wrote: >>> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > -- Co-founder, Discrete Log Technologies http://www.bandhan.com/
