Re: New logo

2017-05-06 Thread Scott C. Cote
Will you be wearing “one of those t-shirts” on Monday in Houston :) ? SCott Scott C. Cote scottcc...@gmail.com 972.672.6484 > On May 6, 2017, at 1:52 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > I know where one of those t-shirts is. > > > > On Sat, May

Re: streaming kmeans vs incremental canopy/solr/kmeans

2015-01-22 Thread Scott C. Cote
they did). On Fri, Feb 14, 2014 at 10:50 AM, Scott C. Cote scottcc...@gmail.comwrote: Right now - I'm dealing with only 40,000 documents, but we will eventually grow more than 10x (put on the manager hat and say 1 mil docs) where a doc is usually no longer than 20 or 30 words. SCott On 2

Re: canopy creating canopies with the same points

2014-03-24 Thread Scott C. Cote
Reinis, The documentation has several Jira¹s open - with one with my name on it. Fortunately, the canopy cluster technology has a good page (as well as some outdated pages). Please see this link for your question: http://mahout.apache.org/users/clustering/canopy-clustering.html as I

Re: canopy creating canopies with the same points

2014-03-24 Thread Scott C. Cote
(this was suggested in some post as a method to find in a fast way T2 that gives particular number of canopies. You mention jiras you opened (gonna check them right after) - could it be one of them is for this special T1 == T2 case? br reinis On 24.03.2014 15:28, Scott C. Cote wrote: Reinis

Re: Website, urgent help needed

2014-03-13 Thread Scott C. Cote
of the page there. Best, Sebastian On 03/12/2014 03:27 PM, Scott C. Cote wrote: I took the tour of the text analysis and pushed through despite the problems on the page. Commiters helped me over the hump where others might have just gave up (to your point). When I did it, I made shell scripts so

Re: Website, urgent help needed

2014-03-12 Thread Scott C. Cote
I took the tour of the text analysis and pushed through despite the problems on the page. Commiters helped me over the hump where others might have just gave up (to your point). When I did it, I made shell scripts so that my steps would be repeatable with an anticipation of updating the page.

Re: Website, urgent help needed

2014-03-12 Thread Scott C. Cote
and a committer will put that into the website. Does that work for you? PS: There are a lot of online markdown editors out there. On 03/12/2014 03:27 PM, Scott C. Cote wrote: I took the tour of the text analysis and pushed through despite the problems on the page. Commiters helped me over

Re: Website, urgent help needed

2014-03-12 Thread Scott C. Cote
ok On 3/12/14, 9:58 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Thanks Scott; please just attach your work to an issue in the Jira system; if there's not one already you could file a new issue. On Mar 12, 2014, at 7:44 AM, Scott C. Cote scottcc...@gmail.com wrote: I’ll make

Re: Welcome Andrew Musselman as new comitter

2014-03-07 Thread Scott C. Cote
I personally am looking forward to the ³advice from the newest ³recommended² committer to hadoop. Congratulations to Mahout team for increasing and growing :) Now back to my using Š. (and hopefully creating something meaningful for you guys) Scott PS: am bootstrapping my Machine Learning

Re: Rework our website

2014-03-06 Thread Scott C. Cote
committers can change the website unfortunately. If you have a text to add, I'm happy to work it in and add your name to our contributers list in the CHANGELOG. Best, Sebastian On 03/05/2014 04:58 PM, Scott C. Cote wrote: I had recently taken the text tour of mahout, but I couldn't decipher a way

Re: Rework our website

2014-03-05 Thread Scott C. Cote
I had recently taken the text tour of mahout, but I couldn't decipher a way to contribute updates to the tour (some of the file names have changed, etc). How would I start? (this was part of my offer to help with the documentation of Mahout). SCott On 3/5/14 9:47 AM, Pat Ferrel

streaming kmeans vs incremental canopy/solr/kmeans

2014-02-14 Thread Scott C. Cote
Hello All, I have two questions (Q1, Q2). Q1: Am digging in to Text Analysis and am wrestling with competing analyzed data maintenance strategies. NOTE: my text comes from a very narrowly focused source. - Am currently crunching the data (batch) using the following scheme: 1. Load source text

Re: get similar items

2014-02-14 Thread Scott C. Cote
+the+Mahout+command+line'. It looks like the case what I said.But I am using JAVA with a Mysql database, is there an example related to this? thanks. -- Original -- From: Scott C. Cote;scottcc...@gmail.com; Date: Wed, Feb 12, 2014 11:47 PM To: user

Re: streaming kmeans vs incremental canopy/solr/kmeans

2014-02-14 Thread Scott C. Cote
? How much do you plan to have? On Fri, Feb 14, 2014 at 8:04 AM, Scott C. Cote scottcc...@gmail.com wrote: Hello All, I have two questions (Q1, Q2). Q1: Am digging in to Text Analysis and am wrestling with competing analyzed data maintenance strategies. NOTE: my text comes from a very

Re: get similar items

2014-02-12 Thread Scott C. Cote
Since you are relying on unguided data - switch from recommenders/classifier to clustering. Anyone else agree with me on this??? SCott On 2/12/14 9:04 AM, Martin, Nick nimar...@pssd.com wrote: Yeah, since it would appear you're lacking requisite data for recommenders the only other thing I can

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Scott C. Cote
frequency in the vectorization process. What is the command you are using to create vectors from your tokenized documents? Drew On Tue, Jan 21, 2014 at 6:30 PM, Scott C. Cote scottcc...@gmail.com wrote: All, Not a Mahout .9 problem ­ once I have this working with .8 Mahout, will immediately pull

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Scott C. Cote
to update the documentation. On Sunday, January 26, 2014 1:34 PM, Scott C. Cote scottcc...@gmail.com wrote: Drew, I'm sorry - I'm derelict (as opposed to dirichlet) in responding that I got passed my problem. It was the min freq that was killing me. Forgot about that parameter. Thank you

Re: Running Mahout Example

2014-01-22 Thread Scott C. Cote
To eliminate the MAHOUT_LOCAL stack traces, I set the env var to an arbitrary value. export MAHOUT_HOME=~/mahout export MAHOUT_LOCAL=yes export PATH=$PATH:${MAHOUT_HOME}/bin On 1/22/14 9:50 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: What's ur Mahout version? On Wednesday, January

Problem converting tokenized documents into TFIDF vectors

2014-01-21 Thread Scott C. Cote
All, Not a Mahout .9 problem ­ once I have this working with .8 Mahout, will immediately pull in the .9 stuffŠ.. I am trying to make a small data set work (perhaps it is too small?) where I am clustering skills (phrases). For sake of brevity (my steps are long) , I have not documented the steps

Re: need help explaining difference in k means output

2014-01-06 Thread Scott C. Cote
and the points associated with each clusters. The ClusteredPoints will be generated in the last iteration and will have the info about the clusters and associated points for each cluster. Best, Mahesh Balija. On Sun, Jan 5, 2014 at 1:59 AM, Scott C. Cote scottcc...@gmail.com wrote: All, When I run

need help explaining difference in k means output

2014-01-04 Thread Scott C. Cote
All, When I run the Kmeans analysis from the command line, # # added the -cd option per instructions in the Mahout In Action (MiA) so the convergance threhsold is .1 # instead of default value of .5 because cosines lie within 0 and 1. # # maximum number of iterations is 10 #

Re: Equality of two DenseMatrix objects

2013-12-29 Thread Scott C. Cote
Ted - thank you for taking the time to point out that in Multivariate Systems, there are many interpretations to what would seem ordinary and non-debatable in scalar mathematics. For example, in the relational algebra world, I know of seven different interpretations of relational division. SCott

Mahout In Action - NewsKMeansClustering sample not generating clusters

2013-12-27 Thread Scott C. Cote
Hello Mahout Trainers and Gurus: I am plowing through the sample code from Mahout in Action. Have been trying to run the example NewsKMeansClustering using the Reuters dataset. Found Alex Ott's Blog http://alexott.blogspot.co.uk/2012/07/getting-started-with-examples-from.htm l And downloaded

Re: Mahout In Action - NewsKMeansClustering sample not generating clusters

2013-12-27 Thread Scott C. Cote
(); while ( reader.next(key, value) ) { System.out.println(key.toString() + belongs to cluster + value.toString()); } reader.close(); } } I'm running out of ideas. Š SCott From: Scott C. Cote scottcc...@gmail.com Date: Friday, December 27, 2013 1:56 PM To: user@mahout.apache.org user

Questions related to MiA and Quick tour of text analysis Š..

2013-12-23 Thread Scott C. Cote
All, Two questions related to Quick tour of text analysis using the Mahout command line 1. metrics: When moving through the process of performing the cluster analysis ­ one can use many different metrics. In the tour, the choice was made to use the Cosine metric. Is there any problems that

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
. On Thursday, December 19, 2013 2:04 PM, Scott C. Cote scottcc...@gmail.com wrote: I manually deleted the temp folder too (After 2 failed starts). Would it be helpful for me to upload my shells that encapsulate all of the commands posted on the tour? They reflect the current state of reuters and .8

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
on the wiki link instructions, the seqdumper should have been on rowsimilarity/part-r-* and not on matrix/matrix for determining similar documents. Hope this helps. Sorry again for the confusion. On Friday, December 20, 2013 4:51 PM, Scott C. Cote scottcc...@gmail.com wrote: Suneel

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
20, 2013 4:51 PM, Scott C. Cote scottcc...@gmail.com wrote: Suneel and others, I am still getting the strange results when I do the tour. Suneel: I manually wiped out the temp folder and also deleted the reuters-XXX folders. Also, per your advice I added the -ow option to all of the commands

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
/clusteredPoints \ I am assuming you had run kmeans clustering, if so the clusters wouldn't overlap. You would see cluster overlap if u were to run fuzzy kmeans clustering. On Friday, December 20, 2013 7:06 PM, Scott C. Cote scottcc...@gmail.com wrote: Suneel, Thank you for your help

unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Scott C. Cote
All, I am a newbie Mahout user and am trying to use the Quick tour of text analysis using the Mahout command line . Thank you to whomever contributed to that page. https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis +using+the+Mahout+command+line Went all the way

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Scott C. Cote
with 21578 rows and 41807 columns to reuters-matrix/matrix Dec 18, 2013 4:01:13 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 3453 ms (Minutes: 0.05755) On Thursday, December 19, 2013 12:14 PM, Scott C. Cote scottcc...@gmail.com wrote: All, I am a newbie Mahout user and am trying

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Scott C. Cote
to see the output. b) Also what was the message at the end of the RowId job. It should read something like 'Wrote out matrix with 21578 rows and 19515 columns to reuters-matrix/matrix'. On Thursday, December 19, 2013 12:14 PM, Scott C. Cote scottcc...@gmail.com wrote: All, I am a newbie

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Scott C. Cote
that). It should be good enough to run the Rowsimilarity job again. On Thursday, December 19, 2013 1:46 PM, Scott C. Cote scottcc...@gmail.com wrote: Suneel, I'm going to do the similarity part of the tour over - my laptop was sleeped in the middle of the run of the rowsimilarity job. Maybe the job