Will you be wearing “one of those t-shirts” on Monday in Houston :) ?
SCott
Scott C. Cote
scottcc...@gmail.com
972.672.6484
> On May 6, 2017, at 1:52 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> I know where one of those t-shirts is.
>
>
>
> On Sat, May
they did).
On Fri, Feb 14, 2014 at 10:50 AM, Scott C. Cote
scottcc...@gmail.comwrote:
Right now - I'm dealing with only 40,000 documents, but we will
eventually
grow more than 10x (put on the manager hat and say 1 mil docs) where a
doc
is usually no longer than 20 or 30 words.
SCott
On 2
Reinis,
The documentation has several Jira¹s open - with one with my name on it.
Fortunately, the canopy cluster technology has a good page (as well as
some outdated pages).
Please see this link for your question:
http://mahout.apache.org/users/clustering/canopy-clustering.html
as I
(this was suggested in some post as a method to
find in a fast way T2 that gives particular number of canopies. You
mention jiras you opened (gonna check them right after) - could it be
one of them is for this special T1 == T2 case?
br
reinis
On 24.03.2014 15:28, Scott C. Cote wrote:
Reinis
of the
page there.
Best,
Sebastian
On 03/12/2014 03:27 PM, Scott C. Cote wrote:
I took the tour of the text analysis and pushed through despite the
problems on the page. Commiters helped me over the hump where others
might have just gave up (to your point).
When I did it, I made shell scripts so
I took the tour of the text analysis and pushed through despite the
problems on the page. Commiters helped me over the hump where others
might have just gave up (to your point).
When I did it, I made shell scripts so that my steps would be repeatable
with an anticipation of updating the page.
and a committer will
put that into the website.
Does that work for you?
PS: There are a lot of online markdown editors out there.
On 03/12/2014 03:27 PM, Scott C. Cote wrote:
I took the tour of the text analysis and pushed through despite the
problems on the page. Commiters helped me over
ok
On 3/12/14, 9:58 AM, Andrew Musselman andrew.mussel...@gmail.com wrote:
Thanks Scott; please just attach your work to an issue in the Jira
system; if there's not one already you could file a new issue.
On Mar 12, 2014, at 7:44 AM, Scott C. Cote scottcc...@gmail.com
wrote:
I’ll make
I personally am looking forward to the ³advice from the newest
³recommended² committer to hadoop.
Congratulations to Mahout team for increasing and growing :)
Now back to my using . (and hopefully creating something meaningful for
you guys)
Scott
PS: am bootstrapping my Machine Learning
committers can change the website unfortunately. If
you have a text to add, I'm happy to work it in and add your name to our
contributers list in the CHANGELOG.
Best,
Sebastian
On 03/05/2014 04:58 PM, Scott C. Cote wrote:
I had recently taken the text tour of mahout, but I couldn't decipher a
way
I had recently taken the text tour of mahout, but I couldn't decipher a
way to contribute updates to the tour (some of the file names have
changed, etc).
How would I start? (this was part of my offer to help with the
documentation of Mahout).
SCott
On 3/5/14 9:47 AM, Pat Ferrel
Hello All,
I have two questions (Q1, Q2).
Q1: Am digging in to Text Analysis and am wrestling with competing analyzed
data maintenance strategies.
NOTE: my text comes from a very narrowly focused source.
- Am currently crunching the data (batch) using the following scheme:
1. Load source text
+the+Mahout+command+line'.
It looks like the case what I said.But I am using JAVA with a Mysql
database, is there an example related to this?
thanks.
-- Original --
From: Scott C. Cote;scottcc...@gmail.com;
Date: Wed, Feb 12, 2014 11:47 PM
To: user
?
How much do you plan to have?
On Fri, Feb 14, 2014 at 8:04 AM, Scott C. Cote scottcc...@gmail.com
wrote:
Hello All,
I have two questions (Q1, Q2).
Q1: Am digging in to Text Analysis and am wrestling with competing
analyzed
data maintenance strategies.
NOTE: my text comes from a very
Since you are relying on unguided data - switch from
recommenders/classifier to clustering.
Anyone else agree with me on this???
SCott
On 2/12/14 9:04 AM, Martin, Nick nimar...@pssd.com wrote:
Yeah, since it would appear you're lacking requisite data for
recommenders the only other thing I can
frequency in the
vectorization process. What is the command you are using to create vectors
from your tokenized documents?
Drew
On Tue, Jan 21, 2014 at 6:30 PM, Scott C. Cote scottcc...@gmail.com
wrote:
All,
Not a Mahout .9 problem once I have this working with .8 Mahout, will
immediately pull
to update the documentation.
On Sunday, January 26, 2014 1:34 PM, Scott C. Cote scottcc...@gmail.com
wrote:
Drew,
I'm sorry - I'm derelict (as opposed to dirichlet) in responding that I
got passed my problem.
It was the min freq that was killing me. Forgot about that parameter.
Thank you
To eliminate the MAHOUT_LOCAL stack traces, I set the env var to an
arbitrary value.
export MAHOUT_HOME=~/mahout
export MAHOUT_LOCAL=yes
export PATH=$PATH:${MAHOUT_HOME}/bin
On 1/22/14 9:50 AM, Suneel Marthi suneel_mar...@yahoo.com wrote:
What's ur Mahout version?
On Wednesday, January
All,
Not a Mahout .9 problem once I have this working with .8 Mahout, will
immediately pull in the .9 stuff..
I am trying to make a small data set work (perhaps it is too small?) where I
am clustering skills (phrases). For sake of brevity (my steps are long) , I
have not documented the steps
and the points associated with each clusters.
The ClusteredPoints will be generated in the last iteration and will have
the info about the clusters and associated points for each cluster.
Best,
Mahesh Balija.
On Sun, Jan 5, 2014 at 1:59 AM, Scott C. Cote scottcc...@gmail.com
wrote:
All,
When I run
All,
When I run the Kmeans analysis from the command line,
#
# added the -cd option per instructions in the Mahout In Action (MiA) so the
convergance threhsold is .1
# instead of default value of .5 because cosines lie within 0 and 1.
#
# maximum number of iterations is 10
#
Ted - thank you for taking the time to point out that in Multivariate
Systems, there are many interpretations to what would seem ordinary and
non-debatable in scalar mathematics.
For example, in the relational algebra world, I know of seven different
interpretations of relational division.
SCott
Hello Mahout Trainers and Gurus:
I am plowing through the sample code from Mahout in Action. Have been
trying to run the example NewsKMeansClustering using the Reuters dataset.
Found Alex Ott's Blog
http://alexott.blogspot.co.uk/2012/07/getting-started-with-examples-from.htm
l
And downloaded
();
while ( reader.next(key, value) )
{
System.out.println(key.toString() + belongs to cluster +
value.toString());
}
reader.close();
}
}
I'm running out of ideas.
SCott
From: Scott C. Cote scottcc...@gmail.com
Date: Friday, December 27, 2013 1:56 PM
To: user@mahout.apache.org user
All,
Two questions related to Quick tour of text analysis using the Mahout
command line
1. metrics:
When moving through the process of performing the cluster analysis one can
use many different metrics. In the tour, the choice was made to use the
Cosine metric. Is there any problems that
.
On Thursday, December 19, 2013 2:04 PM, Scott C. Cote
scottcc...@gmail.com wrote:
I manually deleted the temp folder too (After 2 failed starts).
Would it be helpful for me to upload my shells that encapsulate all of the
commands posted on the tour? They reflect the current state of reuters
and .8
on the wiki link instructions, the seqdumper should have
been on rowsimilarity/part-r-* and not on matrix/matrix for determining
similar documents.
Hope this helps. Sorry again for the confusion.
On Friday, December 20, 2013 4:51 PM, Scott C. Cote
scottcc...@gmail.com wrote:
Suneel
20, 2013 4:51 PM, Scott C. Cote
scottcc...@gmail.com wrote:
Suneel and others,
I am still getting the strange results when I do the tour. Suneel: I
manually wiped out the temp folder and also deleted the reuters-XXX
folders.
Also, per your advice I added the -ow option to all of the commands
/clusteredPoints \
I am assuming you had run kmeans clustering, if so the clusters wouldn't
overlap. You would see cluster overlap if u were to run fuzzy kmeans
clustering.
On Friday, December 20, 2013 7:06 PM, Scott C. Cote
scottcc...@gmail.com wrote:
Suneel,
Thank you for your help
All,
I am a newbie Mahout user and am trying to use the Quick tour of text
analysis using the Mahout command line . Thank you to whomever contributed
to that page.
https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis
+using+the+Mahout+command+line
Went all the way
with 21578 rows and 41807 columns to
reuters-matrix/matrix
Dec 18, 2013 4:01:13 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 3453 ms (Minutes: 0.05755)
On Thursday, December 19, 2013 12:14 PM, Scott C. Cote
scottcc...@gmail.com wrote:
All,
I am a newbie Mahout user and am trying
to see the output.
b) Also what was the message at the end of the RowId job. It should read
something like 'Wrote out matrix with 21578 rows and 19515 columns to
reuters-matrix/matrix'.
On Thursday, December 19, 2013 12:14 PM, Scott C. Cote
scottcc...@gmail.com wrote:
All,
I am a newbie
that).
It should be good enough to run the Rowsimilarity job again.
On Thursday, December 19, 2013 1:46 PM, Scott C. Cote
scottcc...@gmail.com wrote:
Suneel,
I'm going to do the similarity part of the tour over - my laptop was
sleeped in the middle of the run of the rowsimilarity job.
Maybe the job
33 matches
Mail list logo