showed you here and I should check
some other place, like my input data?
Thanks in advance for any help!
--David Starina
Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
MapReduce-based code still makes sense if it is running well on Ignite.
On Mon, Mar 21, 2016 at 8:20 AM, David Starina <david.star...@gmail.com>
wrote:
> Has anyone tried to run the deprecated MapReduce code
Has anyone tried to run the deprecated MapReduce code on Ignite? Is the
performance improvement good enough to reconsider leaving those algorithms
in Mahout?
On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:
> Yes I agree; will leave the question open a
What are the best values for those two parameters? I usually only read
suggestions on how to set the number of iterations (=maxIter). Some suggest
it is best to set it as high as 1000 iterations. However - how about number
of iterations for document? How is this one really used and what would be
you want to know cluster
> inclusion or get a list of similar docs?
>
> On Feb 23, 2016, at 1:01 PM, David Starina <david.star...@gmail.com>
> wrote:
>
> Guys, one more question ... Are there some incremental methods to do this?
> I don't want to run the whole job
About the last question: it probably has something to do with setting the
max iterations and max iterations per document to the same value ... What
is the "number of iterations per document" really doing?
--David
On Thu, Mar 10, 2016 at 5:39 PM, David Starina <david.star...@gma
. Is there something I don't understand about this algorithm? Why
would one iteration take that much longer just because you run more of
iterations?
--David
On Thu, Mar 10, 2016 at 2:24 PM, David Starina <david.star...@gmail.com>
wrote:
> How does memory requirement grow with the number of topics?
How does memory requirement grow with the number of topics? A little
experimentation shows me that number of documents doesn't matter as much as
the number of topics ... Does the memory requirement grow exponentially
with the number of topics?
--David
On Thu, Mar 10, 2016 at 11:43 AM, David
Hi,
I realize MapReduce algorithms are not the "hot new stuff" anymore, but I
am playing around with LDA. I have some problems with the memory, can you
help me suggest how to set up parameters to make this work?
I am running on a virtual cluster on my laptop - two nodes with 3 GB of
memory each
they work well.
>
> The query to the KNN engine is a document, each field mapped to the
> corresponding field of the index. The result is the k nearest neighbors to
> the query doc.
>
>
> > On Feb 14, 2016, at 11:05 AM, David Starina <david.star...@gmail.com>
ll lead you to a good similarity or distance measure.
> > As I recall, Spark does provide an LDA implementation. Gensim provides a
> > API for doing LDA similarity out of the box. Vowpal Wabbit is also worth
> > looking at, particularly for a large dataset.
> > Hope th
Hi,
I need to build a system to determine N (i.e. 10) most similar documents to
a given document. I have some (theoretical) knowledge of Mahout algorithms,
but not enough to build the system. Can you give me some suggestions?
At first I was researching Latent Semantic Analysis for the task, but
Hi,
I am not sure why I can not find the info I am looking for online, probably
not searching in the right way, so I am hoping you guys will be able to
point me in the right direction.
I have set up a Mahout project in IntelliJ IDEA on my machine. I created a
class extending AbstractJob to run
can just leave it at that.
Best regards,
David
On Mon, Feb 1, 2016 at 11:52 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> the user list will not let attachments thru.
>
> On Sun, Jan 31, 2016 at 11:59 PM, David Starina <david.star...@gmail.com>
> wrote:
>
>
Hi,
I have problem importing the project to Eclipse - I get the error "Could
not update project mahout-mr configuration". Attaching the error as image.
Anyone seen this problem before? I am using Eclipse 4.5.1 (Mars.1) of
Fedora 22. I did a Maven build successfully, installed m2eclipse and
You can also check out the implementation in MLlib:
https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec
On Wed, May 13, 2015 at 9:11 PM, Dan Dong dongda...@gmail.com wrote:
Thanks Andrew, I will turn to DL4J.
Cheers,
Dan
2015-05-13 10:34 GMT-05:00 Andrew
Hi,
as Chirag said, try LDA. You can also check an implementation of pLSA, but
it is not part of Mahout, you can find it here:
https://github.com/akopich/dplsa
--David
On Thu, Mar 26, 2015 at 2:01 PM, 3316 Chirag Nagpal
chiragnagpal_12...@aitpune.edu.in wrote:
A better approach I can think
17 matches
Mail list logo