Jeff Eastman wrote:
Jeff Eastman wrote:
Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes
the
probability that a point will go to any new cluster quite low. You
can
compensate somewhat for this with different values of alpha.
Could
Jeff Eastman wrote:
Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes the
probability that a point will go to any new cluster quite low. You can
compensate somewhat for this with different values of alpha.
Could you elaborate more on
Then we could make a profile that turns off the code gen and turns on
the build helper to add the generated source dir instead.
On Fri, Feb 5, 2010 at 4:49 PM, Robin Anil wrote:
> Its just meant to be a dev only hack :)
>
>
> On Sat, Feb 6, 2010 at 3:09 AM, Benson Margulies wrote:
>
>> Yes, the c
Its just meant to be a dev only hack :)
On Sat, Feb 6, 2010 at 3:09 AM, Benson Margulies wrote:
> Yes, the codegen could drop a timestamp file. It's a fair amount of
> work, and if we're killing this code for HPCC I'm dubious.
>
> If I could make the split work I could do this next.
>
>
> On Fri
Grant,
Would the TLP be Mahout or under a different name?
I also like the idea that it does not necessarily have to be a 1:1 port.
Kay Kay,
I change my mind (going the wrapper route), I think it would be nice to
explore the possibilities with just a subset of the algorithms.
That would be a go
Yes, the codegen could drop a timestamp file. It's a fair amount of
work, and if we're killing this code for HPCC I'm dubious.
If I could make the split work I could do this next.
On Fri, Feb 5, 2010 at 12:19 PM, Drew Farris wrote:
> So, I'm running: mvn -o install -DskipTests=true at project r
Thanks everyone for your responses so far.
The Apache Hadoop dependency was something I thought about initially but I
still went ahead to ask the question anyways.
At this time, it would be a better use of resources and time to come up with
a wrapper or HTTP server/client set up of some sort.
My
Thanks!. 25 seconds is a winner. can decrease it down to 15 if re-compile of
parent, math and mojo is turned off.
On Fri, Feb 5, 2010 at 10:49 PM, Drew Farris wrote:
> So, I'm running: mvn -o install -DskipTests=true at project root (in
> mahout)
>
> Comment out or remove the maven-assembly-plug
So, I'm running: mvn -o install -DskipTests=true at project root (in mahout)
Comment out or remove the maven-assembly-plugin definition in
core/pom.xml -- it reduced my core build time from 26s to 6s -- I can
submit a patch for this.
Mahout math is still 17s here due to code generation. I'm wonde
[
https://issues.apache.org/jira/browse/MAHOUT-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-272:
-
Resolution: Fixed
Assignee: Drew Farris
Status: Resolved (was: Patch Available)
> Add lice
I just updated it here.
http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
Lets rename/refactor the classes and get basic avro thing in for 0.3. So
that people who use gets a smooth upgrade to 0.4
Robin
On Fri, Feb 5, 2010 at 10:32 PM, Drew Farris wrote:
> On Fri, Feb 5, 2010 at 1
Yes for editing i use eclipse in the same fashion. If i want to try out a
job and see how it performs on hadoop I need job compiled fast.
On another note. I think there will be a lot of dead code in the job(with
all the jar files bundles) Is there an optimiser for that i.e to remove
classes which
mvn install to generate the job. around 2-3 mins it generates the bz2 zip
gz
mvn compile otherwise(15 secs are in compiling math) out of 33 sec
On Fri, Feb 5, 2010 at 10:18 PM, Drew Farris wrote:
> On Fri, Feb 5, 2010 at 3:27 AM, Robin Anil wrote:
> > When developing mahout core/util/examples
On Fri, Feb 5, 2010 at 11:53 AM, Jake Mannix wrote:
>
> Which is not to say that we shouldn't continue work on them, let's keep the
> patches going and up to date, let's just not worry about holding up 0.3
> until they're fully tested and checked in.
Yes absolutely. I'm also interested in hearin
Sounds great to me.
On Fri, Feb 5, 2010 at 11:50 AM, Ted Dunning wrote:
> Makes a lot of sense. Drew?
>
> On Fri, Feb 5, 2010 at 8:48 AM, Jake Mannix wrote:
>
>> So are we really planning on all this structured document stuff and Avro
>> for
>> 0.3? Can we just try and finish up what was alrea
On Fri, Feb 5, 2010 at 8:48 AM, Jake Mannix wrote:
> So are we really planning on all this structured document stuff and Avro
> for 0.3? Can we just try and finish up what was already scoped for 0.3 and
> have a quick turnaround for getting things which have only been really
> started worked on
Makes a lot of sense. Drew?
On Fri, Feb 5, 2010 at 8:48 AM, Jake Mannix wrote:
> So are we really planning on all this structured document stuff and Avro
> for
> 0.3? Can we just try and finish up what was already scoped for 0.3 and
> have
> a quick turnaround for getting things which have onl
I usually do an initial compilation using mvn package. Then, during
development I use IntelliJ's incremental compilation which generally only
takes a few seconds. Since that compilation doesn't handle things like
copying resources, I get caught out and surprised now and again, but this
works almo
So are we really planning on all this structured document stuff and Avro for
0.3? Can we just try and finish up what was already scoped for 0.3 and have
a quick turnaround for getting things which have only been really started
worked on in the past week or so for 0.4 sometime next month?
-jake
On Fri, Feb 5, 2010 at 3:27 AM, Robin Anil wrote:
> When developing mahout core/util/examples we dont need to generate math
> often and dont need to tar gzip bzip2 the jar files. We are mostly concerned
> with the job file/ jar file.
> Cant there be another target like develop which does this. (wa
On Fri, Feb 5, 2010 at 11:17 AM, Ted Dunning wrote:
> I just marked the 0.1 and 0.2 releases as released (about time). This makes
> the JIRA road map feature more usable.
>
> See here for the live version of this summary:
> https://issues.apache.org/jira/browse/MAHOUT?report=com.atlassian.jira.pl
[
https://issues.apache.org/jira/browse/MAHOUT-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Farris updated MAHOUT-274:
---
Attachment: mahout-avro-examples.tar.gz
Very rudimentary exploration of using avro to produce writabl
Use avro for serialization of structured documents.
---
Key: MAHOUT-274
URL: https://issues.apache.org/jira/browse/MAHOUT-274
Project: Mahout
Issue Type: Improvement
Reporter: Drew
Yum Yum.
0.1 59 issues
0.2 66 issues
0.3 91 issues - 13 left
On Fri, Feb 5, 2010 at 9:47 PM, Ted Dunning wrote:
> I just marked the 0.1 and 0.2 releases as released (about time). This
> makes
> the JIRA road map feature more usable.
>
> See here for the live version of this summary:
>
> ht
Surely there is a clever way to use annotations for this. Not that I know
what it might be.
On Fri, Feb 5, 2010 at 4:05 AM, Robin Anil (JIRA) wrote:
> If we go like this we might have too many options. Any way to streamline
> this ?
>
> One thought i have is to have package level Main classes i
I just marked the 0.1 and 0.2 releases as released (about time). This makes
the JIRA road map feature more usable.
See here for the live version of this summary:
https://issues.apache.org/jira/browse/MAHOUT?report=com.atlassian.jira.plugin.system.project:roadmap-panel
On Fri, Feb 5, 2010 at 3:16
One thought on these lines is that we should start the process to be a TLP,
then we could have a subproject explicitly dedicated to C++ (or any other
language) and there wouldn't necessarily need to be a 1-1 port.
-Grant
On Feb 5, 2010, at 12:56 AM, Kay Kay wrote:
> If there were an effort to
[
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830077#action_12830077
]
Robin Anil commented on MAHOUT-185:
---
I like the script as i am running k-means these days
Reviving this thread. Copy paste the whole thing as we move forward
Current Snapshot
Key Summary
> MAHOUT-221 Implementation of FP-Bonsai Pruning for fast pattern mining
>Done
> MAHOUT-227 Parallel SVM In Progress
> MAHOUT-240 Parallel version of Perceptron Little Progr
[
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830056#action_12830056
]
Robin Anil commented on MAHOUT-153:
---
Any progress on this? Will it be ready soon or shoul
[
https://issues.apache.org/jira/browse/MAHOUT-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil resolved MAHOUT-221.
---
Resolution: Fixed
Committed
> Implementation of FP-Bonsai Pruning for fast pattern mining
> ---
[
https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil resolved MAHOUT-220.
---
Resolution: Fixed
Committed.
> Mahout Bayes Code cleanup
> -
>
>
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-237:
--
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Map/Reduce Implementation of Docum
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-237:
--
Status: Patch Available (was: Reopened)
Working Implementation DictionaryVectorizer using with tf, tfi
I am committing the first level of changes so that drew can work it. I have
updated the patch on the issue as a reference. Ted please take a look when
you get time. The names will change correspondingly
What I have right now is
4 Main Entry points
DocumentProcessor - does SequenceFile => StringTu
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-237:
--
Attachment: MAHOUT-237-tfidf.patch
4 Main Entry points
DocumentProcessor - does SequenceFile => StringT
When developing mahout core/util/examples we dont need to generate math
often and dont need to tar gzip bzip2 the jar files. We are mostly concerned
with the job file/ jar file.
Cant there be another target like develop which does this. (waiting 2-3 mins
for a 2 line change is frustrating)
Robin
37 matches
Mail list logo