Here are 300 documents (size limit of Google docs reached at this point) http://docs.google.com/Doc?id=dc5kkrf9_111htmscqp3
If you run with t1,t2 and 80 and 55, this will run for few minutes. --shashi On Tue, May 12, 2009 at 6:02 PM, Shashikant Kore <[email protected]> wrote: > I tried t1=80 and t2=55 (same as the numbers specified for synthetic > data). Would you like me to upload the 200/500/1000 document vectors? > That's where performance drops non-linearly. > > --shashi > > On Tue, May 12, 2009 at 5:55 PM, Grant Ingersoll <[email protected]> wrote: >> Yep, saw that. Still would be good to see if there is a way to improve it, >> even for low values. Since we are in the early stages of Mahout, it will be >> really important to develop recommendations, etc. on values for things like >> t1 and t2, so any info we can bring to bear on that will be helpful. >> >> That being said, it should be easy enough to reproduce based on your >> description. What were the values for t1 and t2 you tried? >> >> -Grant >> >> On May 12, 2009, at 7:07 AM, Shashikant Kore wrote: >> >>> Grant, >>> >>> I was using low values for t1 and t2. Increasing these values solves >>> the current problem. Now the problem is to find out optimum values for >>> t1 and t2 for given data set. Please check my previous message on >>> this thread for details. >>> >>> Thanks, >>> --shashi >>> >>> On Tue, May 12, 2009 at 4:26 PM, Grant Ingersoll <[email protected]> >>> wrote: >>>> >>>> Is it possible to share the code and the 100 docs? If not, can you >>>> reproduce with synthetic data? >>>> >>>> -Grant >>>> >>>> On May 11, 2009, at 9:38 AM, Shashikant Kore wrote: >>>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >> Solr/Lucene: >> http://www.lucidimagination.com/search >> >> > > > > -- > Co-founder, Discrete Log Technologies > http://www.bandhan.com/ > -- Co-founder, Discrete Log Technologies http://www.bandhan.com/
