Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Hi again, and thanks for the enthousiasm !

I did compile the trunk with the hadoop2 profile and, althoug it didn't
work at first because of some Canopy tests not passing, when I skipped the
tests it compiled and when I tested it afterward it passed.
I used the version I have isntalled, so I just added the line :
2.0.0-cdh4.6.0
To the pom.xml and type :
mvn -DskipTests clean install -Phadoop2
Then :
mvn test

Then I tried it with these settings :
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export
HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
export MAHOUT_HOME=/home/myCompany/Downloads/mahout9

And the command gives me this :
[root@node01 mahout9]# bin/mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

I even tried with :
export HADOOP_HOME=/.../hadoop,
export HADOOP_HOME=/.../hadoop-0.20-mapreduce
export HADOOP_HOME=/.../hadoop-mapreduce

And it still gives me the same result.

And recompiling with  2.0.0 or
 2.0.0-mr1-cdh4.6.0 didn't work.

Any idea ?



2014-03-05 22:42 GMT+01:00 Andrew Musselman :

> I mean "balance the risk aversion against the value of new features" duh.
>
>
> On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
> andrew.mussel...@gmail.com
> > wrote:
>
> > Yeah, for sure; balancing clients' risk aversion to technical features is
> > why we often recommend vendor solutions.
> >
> > Having a little button to choose a newer version of a component in the
> > Manager UI (even with a confirmation dialog that said "Are you sure? Are
> > you crazy?") would be more palatable to some teams than installing
> > tarballs, is what I'm getting at.
> >
> >
> > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen  wrote:
> >
> >> You can always install whatever version of anything on your cluster
> >> that you want. It may or may not work, but often happens to, at least
> >> for whatever you need it to do.
> >>
> >> It's just the same as it is without a packaged distribution -- dump
> >> new tarballs and cross your fingers. Nothing is weird or different
> >> about the setup or layout. That is the "here be dragons" solution,
> >> already
> >>
> >> You go with support from a packaged distribution when you want a "here
> >> be no dragons" solution. Everything else is by definition already
> >> something you can and should do yourself outside of a packaged
> >> distribution. And really -- you freely can, and it's not hard, if you
> >> know what you are doing.
> >>
> >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> >>  wrote:
> >> > Feels like just yesterday :)
> >> >
> >> > Consider this a feature request to have more flexible component
> >> versioning,
> >> > even with a caveat/"here be dragons" warning.  I know that complicates
> >> > things but people do use your releases a long time.  I personally
> >> wished I
> >> > could upgrade Pig on CDH 4 for new features but there was no simple
> way
> >> on
> >> > a managed cluster.
> >> >
> >> >
> >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen  wrote:
> >> >
> >> >> I don't understand this -- CDH always bundles the latest release.
> >> >>
> >> >> You know that CDH4 was released in July 2012, right? So it included
> >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
> >> >> month after it began beta 2.
> >> >>
> >> >> CDH follows semantic versioning and won't introduce changes that are
> >> >> not backwards-compatible in a minor version update. 0.x releases of
> >> >> Mahout act like major version changes -- not backwards compatible. So
> >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> >> >>
> >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov 
> >> >> wrote:
> >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen 
> wrote:
> >> >> >
> >> >> >> I don't follow what here makes you say they are "cut down"
> releases?
> >> >> >>
> >> >> >
> >> >> > meaning it seems to be pretty much 2 releases behind the official.
> >> But i
> >> >> > definitely don't follow CDH developments in this department, you
> >> seem in
> >> >> a
> >> >> > better position to explain the existing patchlevel there so I defer
> >> to
> >> >> you
> >> >> > to explain why this patchlevel is not there.
> >> >> >
> >> >> > -d
> >> >>
> >>
> >
> >
>


Mahout with Storm/Spark

2014-03-06 Thread vineet yadav
Hi,
I am using Mahout LDA algorithm for Topic Modeling on a huge no of
documents(500k or more). Mahout is taking a lot of time, I am looking at
other alternatives. I found the link(
http://www.oracle.com/technetwork/articles/java/micro-1925135.html), where
storm is used with Mallet for real time topic modeling. I want to know if
anyone has tried storm or spark with mahout to speed up the process.

Thanks
Vineet Yadav


Re: Rework our website

2014-03-06 Thread Kevin Moulart
Hi I also prefer the second one.

While I'm at it, there are several links that point to absent pages. I just
clicked on all the link present on page :
http://mahout.apache.org/users/basics/quickstart.html

And those links are broken :
http://mahout.apache.org/users/basics/recommender-documentation.html
http://mahout.apache.org/users/classification/partial-implementation.html
http://mahout.apache.org/users/basics/TasteCommandLine
http://mahout.apache.org/users/recommender/recommendationexamples.html
http://mahout.apache.org/users/basics/parallel-frequent-pattern-mining.html
http://mahout.apache.org/users/basics/mahout.ga.tutorial.html
http://hadoop.apache.org.html/

That's just the ones I found in 2 minutes on the quickstart page.

Best Regards,
Kevin


2014-03-05 23:43 GMT+01:00 Sebastian Schelter :

> At the moment, only committers can change the website unfortunately. If
> you have a text to add, I'm happy to work it in and add your name to our
> contributers list in the CHANGELOG.
>
> Best,
> Sebastian
>
>
>
> On 03/05/2014 04:58 PM, Scott C. Cote wrote:
>
>> I had recently taken the text tour of mahout, but I couldn't decipher a
>> way to contribute updates to the tour (some of the file names have
>> changed, etc).
>>
>> How would I start?   (this was part of my offer to help with the
>> documentation of Mahout).
>>
>> SCott
>>
>> On 3/5/14 9:47 AM, "Pat Ferrel"  wrote:
>>
>>  What no centered text??
>>>
>>> ;-)
>>>
>>> Love either.
>>>
>>> BTW users are no longer able to contribute content to the wiki. Most CMSs
>>> have a way to allow input that is moderated. Might this make getting
>>> documentation help easier? Allow anyone to contribute but committers can
>>> filter out the bad‹sort of like submitting patches.
>>>
>>> On Mar 5, 2014, at 4:11 AM, Sebastian Schelter  wrote:
>>>
>>> Hi everyone,
>>>
>>> In our latest discussion, I argued that the lack (and errors) of
>>> documentation on our website is one of the main pain points of Mahout
>>> atm. To be honest, I'm also not very happy with the design, especially
>>> fonts and spacing make it super hard to read long articles. This also
>>> prevents me from wanting to add articles and documentation.
>>>
>>> I think we should have a beautiful website, where it is fun to add new
>>> stuff.
>>>
>>> My design skills are pretty limited, but fortunately my brother is an art
>>> director! I asked him to make our website a bit more beautiful without
>>> changing to much of the structure, so that a redesign wouldn't take too
>>> long.
>>>
>>> I really like the results and would volunteer to dig out my CSS skills
>>> and do the redesign, if people agree.
>>>
>>> Here are his drafts, I like the second one best:
>>>
>>> https://people.apache.org/~ssc/mahout/mahout.jpg
>>> https://people.apache.org/~ssc/mahout/mahout2.jpg
>>>
>>> Let me know what you think!
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>
>>
>


-- 
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45


Re: Rework our website

2014-03-06 Thread Sebastian Schelter
Thank you very much! Could you create a jira ticket and post the links 
there? That would be awesome, then we can track that this stuff gets fixed.


Best,
Sebastian

On 03/06/2014 02:58 PM, Kevin Moulart wrote:

Hi I also prefer the second one.

While I'm at it, there are several links that point to absent pages. I just
clicked on all the link present on page :
http://mahout.apache.org/users/basics/quickstart.html

And those links are broken :
http://mahout.apache.org/users/basics/recommender-documentation.html
http://mahout.apache.org/users/classification/partial-implementation.html
http://mahout.apache.org/users/basics/TasteCommandLine
http://mahout.apache.org/users/recommender/recommendationexamples.html
http://mahout.apache.org/users/basics/parallel-frequent-pattern-mining.html
http://mahout.apache.org/users/basics/mahout.ga.tutorial.html
http://hadoop.apache.org.html/

That's just the ones I found in 2 minutes on the quickstart page.

Best Regards,
Kevin


2014-03-05 23:43 GMT+01:00 Sebastian Schelter :


At the moment, only committers can change the website unfortunately. If
you have a text to add, I'm happy to work it in and add your name to our
contributers list in the CHANGELOG.

Best,
Sebastian



On 03/05/2014 04:58 PM, Scott C. Cote wrote:


I had recently taken the text tour of mahout, but I couldn't decipher a
way to contribute updates to the tour (some of the file names have
changed, etc).

How would I start?   (this was part of my offer to help with the
documentation of Mahout).

SCott

On 3/5/14 9:47 AM, "Pat Ferrel"  wrote:

  What no centered text??


;-)

Love either.

BTW users are no longer able to contribute content to the wiki. Most CMSs
have a way to allow input that is moderated. Might this make getting
documentation help easier? Allow anyone to contribute but committers can
filter out the bad‹sort of like submitting patches.

On Mar 5, 2014, at 4:11 AM, Sebastian Schelter  wrote:

Hi everyone,

In our latest discussion, I argued that the lack (and errors) of
documentation on our website is one of the main pain points of Mahout
atm. To be honest, I'm also not very happy with the design, especially
fonts and spacing make it super hard to read long articles. This also
prevents me from wanting to add articles and documentation.

I think we should have a beautiful website, where it is fun to add new
stuff.

My design skills are pretty limited, but fortunately my brother is an art
director! I asked him to make our website a bit more beautiful without
changing to much of the structure, so that a redesign wouldn't take too
long.

I really like the results and would volunteer to dig out my CSS skills
and do the redesign, if people agree.

Here are his drafts, I like the second one best:

https://people.apache.org/~ssc/mahout/mahout.jpg
https://people.apache.org/~ssc/mahout/mahout2.jpg

Let me know what you think!

Best,
Sebastian














Re: Rework our website

2014-03-06 Thread Kevin Moulart
Here you go :
https://issues.apache.org/jira/browse/MAHOUT-1434

First JIRA issue posted, so please tell me if I did something wrong in
chosing the categories or anything.


2014-03-06 15:06 GMT+01:00 Sebastian Schelter :

> Thank you very much! Could you create a jira ticket and post the links
> there? That would be awesome, then we can track that this stuff gets fixed.
>
> Best,
> Sebastian
>
> On 03/06/2014 02:58 PM, Kevin Moulart wrote:
>
>> Hi I also prefer the second one.
>>
>> While I'm at it, there are several links that point to absent pages. I
>> just
>> clicked on all the link present on page :
>> http://mahout.apache.org/users/basics/quickstart.html
>>
>> And those links are broken :
>> http://mahout.apache.org/users/basics/recommender-documentation.html
>> http://mahout.apache.org/users/classification/partial-implementation.html
>> http://mahout.apache.org/users/basics/TasteCommandLine
>> http://mahout.apache.org/users/recommender/recommendationexamples.html
>> http://mahout.apache.org/users/basics/parallel-
>> frequent-pattern-mining.html
>> http://mahout.apache.org/users/basics/mahout.ga.tutorial.html
>> http://hadoop.apache.org.html/
>>
>> That's just the ones I found in 2 minutes on the quickstart page.
>>
>> Best Regards,
>> Kevin
>>
>>
>> 2014-03-05 23:43 GMT+01:00 Sebastian Schelter :
>>
>>  At the moment, only committers can change the website unfortunately. If
>>> you have a text to add, I'm happy to work it in and add your name to our
>>> contributers list in the CHANGELOG.
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>>
>>> On 03/05/2014 04:58 PM, Scott C. Cote wrote:
>>>
>>>  I had recently taken the text tour of mahout, but I couldn't decipher a
 way to contribute updates to the tour (some of the file names have
 changed, etc).

 How would I start?   (this was part of my offer to help with the
 documentation of Mahout).

 SCott

 On 3/5/14 9:47 AM, "Pat Ferrel"  wrote:

   What no centered text??

>
> ;-)
>
> Love either.
>
> BTW users are no longer able to contribute content to the wiki. Most
> CMSs
> have a way to allow input that is moderated. Might this make getting
> documentation help easier? Allow anyone to contribute but committers
> can
> filter out the bad‹sort of like submitting patches.
>
> On Mar 5, 2014, at 4:11 AM, Sebastian Schelter  wrote:
>
> Hi everyone,
>
> In our latest discussion, I argued that the lack (and errors) of
> documentation on our website is one of the main pain points of Mahout
> atm. To be honest, I'm also not very happy with the design, especially
> fonts and spacing make it super hard to read long articles. This also
> prevents me from wanting to add articles and documentation.
>
> I think we should have a beautiful website, where it is fun to add new
> stuff.
>
> My design skills are pretty limited, but fortunately my brother is an
> art
> director! I asked him to make our website a bit more beautiful without
> changing to much of the structure, so that a redesign wouldn't take too
> long.
>
> I really like the results and would volunteer to dig out my CSS skills
> and do the redesign, if people agree.
>
> Here are his drafts, I like the second one best:
>
> https://people.apache.org/~ssc/mahout/mahout.jpg
> https://people.apache.org/~ssc/mahout/mahout2.jpg
>
> Let me know what you think!
>
> Best,
> Sebastian
>
>
>


>>>
>>
>>
>


-- 
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45


Re: Rework our website

2014-03-06 Thread Suneel Marthi
I fixed some of the broken links. For some of others eg: TasteCommandline, 
Recommendationexamples either the pages have not been migrated or the links 
have to be purged?






On Thursday, March 6, 2014 9:07 AM, Sebastian Schelter  wrote:
 
Thank you very much! Could you create a jira ticket and post the links 
there? That would be awesome, then we can track that this stuff gets fixed.

Best,
Sebastian


On 03/06/2014 02:58 PM, Kevin Moulart wrote:
> Hi I also prefer the second one.
>
> While I'm at it, there are several links that point to absent pages. I just
> clicked on all the link present on page :
> http://mahout.apache.org/users/basics/quickstart.html
>
> And those links are broken :
> http://mahout.apache.org/users/basics/recommender-documentation.html
> http://mahout.apache.org/users/classification/partial-implementation.html
> http://mahout.apache.org/users/basics/TasteCommandLine
> http://mahout.apache.org/users/recommender/recommendationexamples.html
> http://mahout.apache.org/users/basics/parallel-frequent-pattern-mining.html
> http://mahout.apache.org/users/basics/mahout.ga.tutorial.html
> http://hadoop.apache.org.html/
>
> That's just the ones I found in 2 minutes on the quickstart page.
>
> Best Regards,
> Kevin
>
>
> 2014-03-05 23:43 GMT+01:00 Sebastian Schelter :
>
>> At the moment, only committers can change the website unfortunately. If
>> you have a text to add, I'm happy to work it in and add your name to our
>> contributers list in the CHANGELOG.
>>
>> Best,
>> Sebastian
>>
>>
>>
>> On 03/05/2014 04:58 PM, Scott C. Cote wrote:
>>
>>> I had recently taken the text tour of mahout, but I couldn't decipher a
>>> way to contribute updates to the tour (some of the file names have
>>> changed, etc).
>>>
>>> How would I start?   (this was part of my offer to help with the
>>> documentation of Mahout).
>>>
>>> SCott
>>>
>>> On 3/5/14 9:47 AM, "Pat Ferrel"  wrote:
>>>
>>>   What no centered text??

 ;-)

 Love either.

 BTW users are no longer able to contribute content to the wiki. Most CMSs
 have a way to allow input that is moderated. Might this make getting
 documentation help easier? Allow anyone to contribute but committers can
 filter out the bad‹sort of like submitting patches.

 On Mar 5, 2014, at 4:11 AM, Sebastian Schelter  wrote:

 Hi everyone,

 In our latest discussion, I argued that the lack (and errors) of
 documentation on our website is one of the main pain points of Mahout
 atm. To be honest, I'm also not very happy with the design, especially
 fonts and spacing make it super hard to read long articles. This also
 prevents me from wanting to add articles and documentation.

 I think we should have a beautiful website, where it is fun to add new
 stuff.

 My design skills are pretty limited, but fortunately my brother is an art
 director! I asked him to make our website a bit more beautiful without
 changing to much of the structure, so that a redesign wouldn't take too
 long.

 I really like the results and would volunteer to dig out my CSS skills
 and do the redesign, if people agree.

 Here are his drafts, I like the second one best:

 https://people.apache.org/~ssc/mahout/mahout.jpg
 https://people.apache.org/~ssc/mahout/mahout2.jpg

 Let me know what you think!

 Best,
 Sebastian


>>>
>>>
>>
>
>

Re: Rework our website

2014-03-06 Thread Sebastian Schelter

Could you add the missing pages to the jira issue? I'll have a look later.

On 03/06/2014 03:25 PM, Suneel Marthi wrote:

I fixed some of the broken links. For some of others eg: TasteCommandline, 
Recommendationexamples either the pages have not been migrated or the links 
have to be purged?






On Thursday, March 6, 2014 9:07 AM, Sebastian Schelter  wrote:

Thank you very much! Could you create a jira ticket and post the links
there? That would be awesome, then we can track that this stuff gets fixed.

Best,
Sebastian


On 03/06/2014 02:58 PM, Kevin Moulart wrote:

Hi I also prefer the second one.

While I'm at it, there are several links that point to absent pages. I just
clicked on all the link present on page :
http://mahout.apache.org/users/basics/quickstart.html

And those links are broken :
http://mahout.apache.org/users/basics/recommender-documentation.html
http://mahout.apache.org/users/classification/partial-implementation.html
http://mahout.apache.org/users/basics/TasteCommandLine
http://mahout.apache.org/users/recommender/recommendationexamples.html
http://mahout.apache.org/users/basics/parallel-frequent-pattern-mining.html
http://mahout.apache.org/users/basics/mahout.ga.tutorial.html
http://hadoop.apache.org.html/

That's just the ones I found in 2 minutes on the quickstart page.

Best Regards,
Kevin


2014-03-05 23:43 GMT+01:00 Sebastian Schelter :


At the moment, only committers can change the website unfortunately. If
you have a text to add, I'm happy to work it in and add your name to our
contributers list in the CHANGELOG.

Best,
Sebastian



On 03/05/2014 04:58 PM, Scott C. Cote wrote:


I had recently taken the text tour of mahout, but I couldn't decipher a
way to contribute updates to the tour (some of the file names have
changed, etc).

How would I start?   (this was part of my offer to help with the
documentation of Mahout).

SCott

On 3/5/14 9:47 AM, "Pat Ferrel"  wrote:

What no centered text??


;-)

Love either.

BTW users are no longer able to contribute content to the wiki. Most CMSs
have a way to allow input that is moderated. Might this make getting
documentation help easier? Allow anyone to contribute but committers can
filter out the bad‹sort of like submitting patches.

On Mar 5, 2014, at 4:11 AM, Sebastian Schelter  wrote:

Hi everyone,

In our latest discussion, I argued that the lack (and errors) of
documentation on our website is one of the main pain points of Mahout
atm. To be honest, I'm also not very happy with the design, especially
fonts and spacing make it super hard to read long articles. This also
prevents me from wanting to add articles and documentation.

I think we should have a beautiful website, where it is fun to add new
stuff.

My design skills are pretty limited, but fortunately my brother is an art
director! I asked him to make our website a bit more beautiful without
changing to much of the structure, so that a redesign wouldn't take too
long.

I really like the results and would volunteer to dig out my CSS skills
and do the redesign, if people agree.

Here are his drafts, I like the second one best:

https://people.apache.org/~ssc/mahout/mahout.jpg
https://people.apache.org/~ssc/mahout/mahout2.jpg

Let me know what you think!

Best,
Sebastian














Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Gokhan Capan
Kevin,


>From trunk, can you build mahout for hadoop2 using this command:

mvn clean package -DskipTests=true -Dhadoop2.version=


Then can you verify that you have the right hadoop jars with the following
command:

find . -name hadoop*.jar



Gokhan


On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart wrote:

> Hi again, and thanks for the enthousiasm !
>
> I did compile the trunk with the hadoop2 profile and, althoug it didn't
> work at first because of some Canopy tests not passing, when I skipped the
> tests it compiled and when I tested it afterward it passed.
> I used the version I have isntalled, so I just added the line :
> 2.0.0-cdh4.6.0
> To the pom.xml and type :
> mvn -DskipTests clean install -Phadoop2
> Then :
> mvn test
>
> Then I tried it with these settings :
> export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> export
>
> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
>
> And the command gives me this :
> [root@node01 mahout9]# bin/mahout
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB:
>
> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> I even tried with :
> export HADOOP_HOME=/.../hadoop,
> export HADOOP_HOME=/.../hadoop-0.20-mapreduce
> export HADOOP_HOME=/.../hadoop-mapreduce
>
> And it still gives me the same result.
>
> And recompiling with  2.0.0 or
>  2.0.0-mr1-cdh4.6.0 didn't work.
>
> Any idea ?
>
>
>
> 2014-03-05 22:42 GMT+01:00 Andrew Musselman :
>
> > I mean "balance the risk aversion against the value of new features" duh.
> >
> >
> > On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
> > andrew.mussel...@gmail.com
> > > wrote:
> >
> > > Yeah, for sure; balancing clients' risk aversion to technical features
> is
> > > why we often recommend vendor solutions.
> > >
> > > Having a little button to choose a newer version of a component in the
> > > Manager UI (even with a confirmation dialog that said "Are you sure?
> Are
> > > you crazy?") would be more palatable to some teams than installing
> > > tarballs, is what I'm getting at.
> > >
> > >
> > > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen  wrote:
> > >
> > >> You can always install whatever version of anything on your cluster
> > >> that you want. It may or may not work, but often happens to, at least
> > >> for whatever you need it to do.
> > >>
> > >> It's just the same as it is without a packaged distribution -- dump
> > >> new tarballs and cross your fingers. Nothing is weird or different
> > >> about the setup or layout. That is the "here be dragons" solution,
> > >> already
> > >>
> > >> You go with support from a packaged distribution when you want a "here
> > >> be no dragons" solution. Everything else is by definition already
> > >> something you can and should do yourself outside of a packaged
> > >> distribution. And really -- you freely can, and it's not hard, if you
> > >> know what you are doing.
> > >>
> > >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> > >>  wrote:
> > >> > Feels like just yesterday :)
> > >> >
> > >> > Consider this a feature request to have more flexible component
> > >> versioning,
> > >> > even with a caveat/"here be dragons" warning.  I know that
> complicates
> > >> > things but people do use your releases a long time.  I personally
> > >> wished I
> > >> > could upgrade Pig on CDH 4 for new features but there was no simple
> > way
> > >> on
> > >> > a managed cluster.
> > >> >
> > >> >
> > >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen 
> wrote:
> > >> >
> > >> >> I don't understand this -- CDH always bundles the latest release.
> > >> >>
> > >> >> You know that CDH4 was released in July 2012, right? So it included
> > >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
> > >> >> month after it began beta 2.
> > >> >>
> > >> >> CDH follows semantic versioning and won't introduce changes that
> are
> > >> >> not backwards-compatible in a minor version update. 0.x releases of
> > >> >> Mahout act like major version changes -- not backwards compatible.
> So
> > >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> > >> >>
> > >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <
> dlie...@gmail.com>
> > >> >>

Re: Rework our website

2014-03-06 Thread Suneel Marthi
There is stuff that needs to be migrated over from the old Web site. See Jira 
for the details.





On Thursday, March 6, 2014 9:45 AM, Sebastian Schelter  wrote:
 
Could you add the missing pages to the jira issue? I'll have a look later.


On 03/06/2014 03:25 PM, Suneel Marthi wrote:
> I fixed some of the broken links. For some of others eg: TasteCommandline, 
> Recommendationexamples either the pages have not been migrated or the links 
> have to be purged?
>
>
>
>
>
>
> On Thursday, March 6, 2014 9:07 AM, Sebastian Schelter  
> wrote:
>
> Thank you very much! Could you create a jira ticket and post the links
> there? That would be awesome, then we can track that this stuff gets fixed.
>
> Best,
> Sebastian
>
>
> On
 03/06/2014 02:58 PM, Kevin Moulart wrote:
>> Hi I also prefer the second one.
>>
>> While I'm at it, there are several links that point to absent pages. I just
>> clicked on all the link present on page :
>> http://mahout.apache.org/users/basics/quickstart.html
>>
>> And those links are broken :
>> http://mahout.apache.org/users/basics/recommender-documentation.html
>> http://mahout.apache.org/users/classification/partial-implementation.html
>> http://mahout.apache.org/users/basics/TasteCommandLine
>> http://mahout.apache.org/users/recommender/recommendationexamples.html
>> http://mahout.apache.org/users/basics/parallel-frequent-pattern-mining.html
>> http://mahout.apache.org/users/basics/mahout.ga.tutorial.html
>> http://hadoop.apache.org.html/
>>
>> That's just the ones I found in 2 minutes on the quickstart page.
>>
>> Best Regards,
>> Kevin
>>
>>
>> 2014-03-05 23:43 GMT+01:00 Sebastian Schelter :
>>
>>> At the moment, only committers can change the website unfortunately. If
>>> you have a text to add, I'm happy to work it in and add your name to our
>>> contributers list in the CHANGELOG.
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>>
>>> On 03/05/2014 04:58 PM, Scott C. Cote wrote:
>>>
 I had recently taken the text tour of mahout, but I couldn't decipher a
 way to contribute updates to the tour (some of the file names have
 changed, etc).

 How would I start?   (this was part of my offer to help with the
 documentation of Mahout).

 SCott

 On 3/5/14 9:47 AM, "Pat Ferrel"  wrote:

     What no centered text??
>
> ;-)
>
> Love either.
>
> BTW users are no longer able to contribute content to the wiki. Most CMSs
> have a way to allow input that is moderated. Might this make getting
> documentation help easier? Allow anyone to contribute but committers can
> filter out the bad‹sort of like submitting patches.
>
> On Mar 5, 2014, at 4:11 AM, Sebastian Schelter  wrote:
>
> Hi everyone,
>
> In our latest discussion, I argued that the lack (and errors) of
> documentation on our website is one of the main pain points of Mahout
> atm. To be honest, I'm also not very happy with the design, especially
> fonts and spacing make it super hard to read long articles. This also
> prevents me from wanting to add articles and documentation.
>
> I think we should have a beautiful website, where it is fun to add new
> stuff.
>
> My design skills are pretty limited, but fortunately my brother is an art
> director! I asked him to make our website a bit more beautiful without
> changing to much of the structure, so that a redesign wouldn't take too
> long.
>
> I really like the results and would volunteer to dig out my CSS skills
> and do the redesign, if people agree.
>
> Here are his drafts, I like the second one best:
>
> https://people.apache.org/~ssc/mahout/mahout.jpg
> https://people.apache.org/~ssc/mahout/mahout2.jpg
>
> Let me know what you think!
>
> Best,
> Sebastian
>
>


>>>
>>
>>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Hi thanks very much it seems to have worked !
Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
and I no longer have the error, but then when running tests that used to
work with previous install like trainAdaptativeLogistic and then
ValidateAdaptativeLogistic, the first works but the second yields an error :

bin/mahout validateAdaptiveLogistic --input
/mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
/mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
--confusion.
14/03/06 15:53:42 WARN driver.MahoutDriver: No
validateAdaptiveLogistic.props found on classpath, will use command-line
arguments only
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
at org.apache.mahout.math.stats.GroupTree$1.(GroupTree.java:171)
 at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
 at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
 at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
 at
org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
at
org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
 at
org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

I'll try some other tests to see what's working and what's not.



2014-03-06 15:58 GMT+01:00 Gokhan Capan :

> Kevin,
>
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -DskipTests=true -Dhadoop2.version=
>
>
> Then can you verify that you have the right hadoop jars with the following
> command:
>
> find . -name hadoop*.jar
>
>
>
> Gokhan
>
>
> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart  >wrote:
>
> > Hi again, and thanks for the enthousiasm !
> >
> > I did compile the trunk with the hadoop2 profile and, althoug it didn't
> > work at first because of some Canopy tests not passing, when I skipped
> the
> > tests it compiled and when I tested it afterward it passed.
> > I used the version I have isntalled, so I just added the line :
> > 2.0.0-cdh4.6.0
> > To the pom.xml and type :
> > mvn -DskipTests clean install -Phadoop2
> > Then :
> > mvn test
> >
> > Then I tried it with these settings :
> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> > export
> >
> >
> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
> >
> > And the command gives me this :
> > [root@node01 mahout9]# bin/mahout
> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> > Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> > and HADOOP_CONF_DIR=/etc/hadoop/conf
> > MAHOUT-JOB:
> >
> >
> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> > Exception in thread "main" java.lang.NoSuchMethodError:
> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >  at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > I even tried with :
> > export HADOOP_HOME=/.../hadoop,
> > export HADOOP_HOME=/.../hadoop-0.20-mapreduce
> > export HADOOP_HOME=/.../hadoop-mapreduce
> >
> > And it still gives me the same result.
> >
> > And recompiling with  2.0.0 or
> >  2.0.0-mr1-cdh4.6.0 didn't work.
> >
> > Any idea ?
> >
> >
> >
> > 2014-03-05 22:42 GMT+01:00 Andrew Musselman  >:
> >
> > > I mean "balance the risk aversion against the value of new features"

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Sean Owen
That's gonna be a Guava version problem. I have seen variants of this
for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
get away with using a later version in a project like this, even
though code that executes on Hadoop will use an older Guava than you
compiled against. This is an example of that gotcha. I think it may be
necessary to force Mahout to use 11.0.2 and change this code.

I am having deja vu like this has come up before too.





On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart  wrote:
> Hi thanks very much it seems to have worked !
> Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
> and I no longer have the error, but then when running tests that used to
> work with previous install like trainAdaptativeLogistic and then
> ValidateAdaptativeLogistic, the first works but the second yields an error :
>
> bin/mahout validateAdaptiveLogistic --input
> /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
> /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
> --confusion.
> 14/03/06 15:53:42 WARN driver.MahoutDriver: No
> validateAdaptiveLogistic.props found on classpath, will use command-line
> arguments only
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> at org.apache.mahout.math.stats.GroupTree$1.(GroupTree.java:171)
>  at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
>  at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
>  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
>  at
> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> at
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
>  at
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> I'll try some other tests to see what's working and what's not.
>
>
>
> 2014-03-06 15:58 GMT+01:00 Gokhan Capan :
>
>> Kevin,
>>
>>
>> From trunk, can you build mahout for hadoop2 using this command:
>>
>> mvn clean package -DskipTests=true -Dhadoop2.version=
>>
>>
>> Then can you verify that you have the right hadoop jars with the following
>> command:
>>
>> find . -name hadoop*.jar
>>
>>
>>
>> Gokhan
>>
>>
>> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart > >wrote:
>>
>> > Hi again, and thanks for the enthousiasm !
>> >
>> > I did compile the trunk with the hadoop2 profile and, althoug it didn't
>> > work at first because of some Canopy tests not passing, when I skipped
>> the
>> > tests it compiled and when I tested it afterward it passed.
>> > I used the version I have isntalled, so I just added the line :
>> > 2.0.0-cdh4.6.0
>> > To the pom.xml and type :
>> > mvn -DskipTests clean install -Phadoop2
>> > Then :
>> > mvn test
>> >
>> > Then I tried it with these settings :
>> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
>> > export
>> >
>> >
>> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
>> >
>> > And the command gives me this :
>> > [root@node01 mahout9]# bin/mahout
>> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> > Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
>> > and HADOOP_CONF_DIR=/etc/hadoop/conf
>> > MAHOUT-JOB:
>> >
>> >
>> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> > Exception in thread "main" java.lang.NoSuchMethodError:
>> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
>> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessor

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Ok so should I try and recompile and change the guava version to 11.0.2 in
the pom ?

Kévin Moulart


2014-03-06 16:26 GMT+01:00 Sean Owen :

> That's gonna be a Guava version problem. I have seen variants of this
> for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
> get away with using a later version in a project like this, even
> though code that executes on Hadoop will use an older Guava than you
> compiled against. This is an example of that gotcha. I think it may be
> necessary to force Mahout to use 11.0.2 and change this code.
>
> I am having deja vu like this has come up before too.
>
>
>
>
>
> On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart 
> wrote:
> > Hi thanks very much it seems to have worked !
> > Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
> > and I no longer have the error, but then when running tests that used to
> > work with previous install like trainAdaptativeLogistic and then
> > ValidateAdaptativeLogistic, the first works but the second yields an
> error :
> >
> > bin/mahout validateAdaptiveLogistic --input
> > /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
> > /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
> > --confusion.
> > 14/03/06 15:53:42 WARN driver.MahoutDriver: No
> > validateAdaptiveLogistic.props found on classpath, will use command-line
> > arguments only
> > Exception in thread "main" java.lang.NoSuchMethodError:
> > com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> > at org.apache.mahout.math.stats.GroupTree$1.(GroupTree.java:171)
> >  at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> > at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
> >  at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
> >  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
> >  at
> >
> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> > at
> >
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
> >  at
> >
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >  at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >  at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> >  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >  at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > I'll try some other tests to see what's working and what's not.
> >
> >
> >
> > 2014-03-06 15:58 GMT+01:00 Gokhan Capan :
> >
> >> Kevin,
> >>
> >>
> >> From trunk, can you build mahout for hadoop2 using this command:
> >>
> >> mvn clean package -DskipTests=true
> -Dhadoop2.version=
> >>
> >>
> >> Then can you verify that you have the right hadoop jars with the
> following
> >> command:
> >>
> >> find . -name hadoop*.jar
> >>
> >>
> >>
> >> Gokhan
> >>
> >>
> >> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart  >> >wrote:
> >>
> >> > Hi again, and thanks for the enthousiasm !
> >> >
> >> > I did compile the trunk with the hadoop2 profile and, althoug it
> didn't
> >> > work at first because of some Canopy tests not passing, when I skipped
> >> the
> >> > tests it compiled and when I tested it afterward it passed.
> >> > I used the version I have isntalled, so I just added the line :
> >> > 2.0.0-cdh4.6.0
> >> > To the pom.xml and type :
> >> > mvn -DskipTests clean install -Phadoop2
> >> > Then :
> >> > mvn test
> >> >
> >> > Then I tried it with these settings :
> >> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> >> > export
> >> >
> >> >
> >>
> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> >> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
> >> >
> >> > And the command gives me this :
> >> > [root@node01 mahout9]# bin/mahout
> >> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >> > Running on hadoop, using
> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> >> > and HADOOP_CONF_DIR=/etc/hadoop/conf
> >> > MAHOUT-JOB:
> >> >
> >> >
> >>
> /home/myCompany/Downloads/mahout9/examples/ta

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Sean Owen
If I'm right, then it will cause compile errors, but then, you just
fix those by replacing some Guava constructs with equivalent Java or
older Guava code. IIRC it is fairly trivial.

And in fact probably should not use Guava 12+ methods for this reason
even if compiling against 12+. And in fact I thought someone cleaned
that up...

On Thu, Mar 6, 2014 at 3:34 PM, Kevin Moulart  wrote:
> Ok so should I try and recompile and change the guava version to 11.0.2 in
> the pom ?
>
> Kévin Moulart
>
>
> 2014-03-06 16:26 GMT+01:00 Sean Owen :
>
>> That's gonna be a Guava version problem. I have seen variants of this
>> for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
>> get away with using a later version in a project like this, even
>> though code that executes on Hadoop will use an older Guava than you
>> compiled against. This is an example of that gotcha. I think it may be
>> necessary to force Mahout to use 11.0.2 and change this code.
>>
>> I am having deja vu like this has come up before too.
>>
>>
>>
>>
>>
>> On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart 
>> wrote:
>> > Hi thanks very much it seems to have worked !
>> > Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
>> > and I no longer have the error, but then when running tests that used to
>> > work with previous install like trainAdaptativeLogistic and then
>> > ValidateAdaptativeLogistic, the first works but the second yields an
>> error :
>> >
>> > bin/mahout validateAdaptiveLogistic --input
>> > /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
>> > /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
>> > --confusion.
>> > 14/03/06 15:53:42 WARN driver.MahoutDriver: No
>> > validateAdaptiveLogistic.props found on classpath, will use command-line
>> > arguments only
>> > Exception in thread "main" java.lang.NoSuchMethodError:
>> > com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
>> > at org.apache.mahout.math.stats.GroupTree$1.(GroupTree.java:171)
>> >  at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
>> > at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
>> >  at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
>> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
>> >  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
>> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
>> >  at
>> >
>> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
>> > at
>> >
>> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
>> >  at
>> >
>> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >  at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >  at java.lang.reflect.Method.invoke(Method.java:606)
>> > at
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>> >  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >  at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> >
>> > I'll try some other tests to see what's working and what's not.
>> >
>> >
>> >
>> > 2014-03-06 15:58 GMT+01:00 Gokhan Capan :
>> >
>> >> Kevin,
>> >>
>> >>
>> >> From trunk, can you build mahout for hadoop2 using this command:
>> >>
>> >> mvn clean package -DskipTests=true
>> -Dhadoop2.version=
>> >>
>> >>
>> >> Then can you verify that you have the right hadoop jars with the
>> following
>> >> command:
>> >>
>> >> find . -name hadoop*.jar
>> >>
>> >>
>> >>
>> >> Gokhan
>> >>
>> >>
>> >> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart > >> >wrote:
>> >>
>> >> > Hi again, and thanks for the enthousiasm !
>> >> >
>> >> > I did compile the trunk with the hadoop2 profile and, althoug it
>> didn't
>> >> > work at first because of some Canopy tests not passing, when I skipped
>> >> the
>> >> > tests it compiled and when I tested it afterward it passed.
>> >> > I used the version I have isntalled, so I just added the line :
>> >> > 2.0.0-cdh4.6.0
>> >> > To the pom.xml and type :
>> >> > mvn -DskipTests clean install -Phadoop2
>> >> > Then :
>> >> > mvn test
>> >> >
>> >> > Then I tried it with these settings :
>> >> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
>> >> > export
>> >> >
>> >> >
>> >>
>> HADOOP_CLAS

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Indeed it causes compile errors :
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
(default-compile) on project mahout-math: Compilation failure
[ERROR]
/home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31]
cannot find symbol
[ERROR] symbol:   method newArrayDeque()
[ERROR] location: class com.google.common.collect.Queues
 So I'll dig in the code to find where to replace and find an equivalent.

Thanks for your help !

Kévin Moulart


2014-03-06 16:36 GMT+01:00 Sean Owen :

> If I'm right, then it will cause compile errors, but then, you just
> fix those by replacing some Guava constructs with equivalent Java or
> older Guava code. IIRC it is fairly trivial.
>
> And in fact probably should not use Guava 12+ methods for this reason
> even if compiling against 12+. And in fact I thought someone cleaned
> that up...
>
> On Thu, Mar 6, 2014 at 3:34 PM, Kevin Moulart 
> wrote:
> > Ok so should I try and recompile and change the guava version to 11.0.2
> in
> > the pom ?
> >
> > Kévin Moulart
> >
> >
> > 2014-03-06 16:26 GMT+01:00 Sean Owen :
> >
> >> That's gonna be a Guava version problem. I have seen variants of this
> >> for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
> >> get away with using a later version in a project like this, even
> >> though code that executes on Hadoop will use an older Guava than you
> >> compiled against. This is an example of that gotcha. I think it may be
> >> necessary to force Mahout to use 11.0.2 and change this code.
> >>
> >> I am having deja vu like this has come up before too.
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart 
> >> wrote:
> >> > Hi thanks very much it seems to have worked !
> >> > Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0"
> works
> >> > and I no longer have the error, but then when running tests that used
> to
> >> > work with previous install like trainAdaptativeLogistic and then
> >> > ValidateAdaptativeLogistic, the first works but the second yields an
> >> error :
> >> >
> >> > bin/mahout validateAdaptiveLogistic --input
> >> > /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
> >> > /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
> >> > --confusion.
> >> > 14/03/06 15:53:42 WARN driver.MahoutDriver: No
> >> > validateAdaptiveLogistic.props found on classpath, will use
> command-line
> >> > arguments only
> >> > Exception in thread "main" java.lang.NoSuchMethodError:
> >> > com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> >> > at org.apache.mahout.math.stats.GroupTree$1.(GroupTree.java:171)
> >> >  at
> org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> >> > at
> org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
> >> >  at
> org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> >> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
> >> >  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> >> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
> >> >  at
> >> >
> >>
> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> >> > at
> >> >
> >>
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
> >> >  at
> >> >
> >>
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >  at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> > at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >  at java.lang.reflect.Method.invoke(Method.java:606)
> >> > at
> >> >
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> >> >  at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> >> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> >  at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> > at java.lang.reflect.Method.invoke(Method.java:606)
> >> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >> >
> >> > I'll try some other tests to see what's working and what's not.
> >> >
> >> >
> >> >
> >> > 2014-03-06 15:58 GMT+01:00 Gokhan Capan :
> >> >
> >> >> Kevin,
> >> >>
> >> >>
> >> >> From trunk, can you build mahout for hadoop2 using this command:
> >> >>
> >> >> mvn clean package -DskipTests=true
> >> -Dhadoop2.version=
> >> >>
> >> >>
> >> >> Then can you verify that you have the right hadoop jars with the
> >> following
> >> >> command:
> >> >>
> >> >> fin

Re: Mahout with Storm/Spark

2014-03-06 Thread Ted Dunning
WHich version are you using?


On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav wrote:

> Hi,
> I am using Mahout LDA algorithm for Topic Modeling on a huge no of
> documents(500k or more). Mahout is taking a lot of time, I am looking at
> other alternatives. I found the link(
> http://www.oracle.com/technetwork/articles/java/micro-1925135.html), where
> storm is used with Mallet for real time topic modeling. I want to know if
> anyone has tried storm or spark with mahout to speed up the process.
>
> Thanks
> Vineet Yadav
>


Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Ted Dunning
On Thu, Mar 6, 2014 at 7:46 AM, Kevin Moulart wrote:

> [ERROR]
>
> /home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31]
> cannot find symbol
>

Replace that line with:

stack = new ArrayDeque();


Re: Rework our website

2014-03-06 Thread Scott C. Cote
Ok - I expected (and am actually pleased that its not a free-for-all.

I’ll see what has already been updated in this latest flurry of updates
and see what I can contribute.  Forwarded to you.

Thanks,

SCott

On 3/5/14, 4:43 PM, "Sebastian Schelter"  wrote:

>At the moment, only committers can change the website unfortunately. If
>you have a text to add, I'm happy to work it in and add your name to our
>contributers list in the CHANGELOG.
>
>Best,
>Sebastian
>
>
>On 03/05/2014 04:58 PM, Scott C. Cote wrote:
>> I had recently taken the text tour of mahout, but I couldn't decipher a
>> way to contribute updates to the tour (some of the file names have
>> changed, etc).
>>
>> How would I start?   (this was part of my offer to help with the
>> documentation of Mahout).
>>
>> SCott
>>
>> On 3/5/14 9:47 AM, "Pat Ferrel"  wrote:
>>
>>> What no centered text??
>>>
>>> ;-)
>>>
>>> Love either.
>>>
>>> BTW users are no longer able to contribute content to the wiki. Most
>>>CMSs
>>> have a way to allow input that is moderated. Might this make getting
>>> documentation help easier? Allow anyone to contribute but committers
>>>can
>>> filter out the bad‹sort of like submitting patches.
>>>
>>> On Mar 5, 2014, at 4:11 AM, Sebastian Schelter  wrote:
>>>
>>> Hi everyone,
>>>
>>> In our latest discussion, I argued that the lack (and errors) of
>>> documentation on our website is one of the main pain points of Mahout
>>> atm. To be honest, I'm also not very happy with the design, especially
>>> fonts and spacing make it super hard to read long articles. This also
>>> prevents me from wanting to add articles and documentation.
>>>
>>> I think we should have a beautiful website, where it is fun to add new
>>> stuff.
>>>
>>> My design skills are pretty limited, but fortunately my brother is an
>>>art
>>> director! I asked him to make our website a bit more beautiful without
>>> changing to much of the structure, so that a redesign wouldn't take too
>>> long.
>>>
>>> I really like the results and would volunteer to dig out my CSS skills
>>> and do the redesign, if people agree.
>>>
>>> Here are his drafts, I like the second one best:
>>>
>>> https://people.apache.org/~ssc/mahout/mahout.jpg
>>> https://people.apache.org/~ssc/mahout/mahout2.jpg
>>>
>>> Let me know what you think!
>>>
>>> Best,
>>> Sebastian
>>>
>>
>>
>




Reuters Example LDA Error (no help anywhere)

2014-03-06 Thread Cosmin Dumbrava
I don't know if is ok to mail on this address like this but... there is

I have executed cluster-reuters.sh from example directory (vers 1.0
SNAPSHOT) and at the end i only get a list of
.
21575{0.02:0.6314297270431626,0.03:
0.12547216143460152,0.007050:0.0806108337305,0.04:0.07121802301642256,0.025:0.0677648308012434,0.003:0.0221466872297289,0.06:4.4720109631453837E-4,0.01:4.0331445050718065E-4,0.077:1.0509017796402916E-4,0.1:6.868649426131684E-5}
21576
{0.055:0.7123345754234253,0.003:0.10345316403842542,0.025:0.07850931669910466,0.1:0.0688641506163345,0.06:0.010599081492449824,0:0.0081953368778766,0.04:0.00469907695241742,0.03:0.003966985061879055,0.07:0.002197060890631658,0.0625:0.0020741956232281466}
21577
{0.04:0.5277733526037044,0.01:0.46656672162804314,0.07:0.0024295914763474164,0.1:0.002243674469679058,0.077:8.012577174900807E-4,0.007050:3.9184997476998896E-5,0.03:3.2141106779800255E-5,0.0625:2.4665616652494003E-5,0.02:1.949377177063371E-5,0.025:1.3329985998932362E-5}


$MAHOUT cvb \
-i ${WORK_DIR}/reuters-out-matrix/matrix \
-o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20 \
-dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
-dt ${WORK_DIR}/reuters-lda-topics \
-mt ${WORK_DIR}/reuters-lda-model \
  && \
  $MAHOUT vectordump \
-i ${WORK_DIR}/reuters-lda-topics/part-m-0 \
-o ${WORK_DIR}/reuters-lda/vectordump \
-vs 10 -p true \
-d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
-dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-0 \
&& \

 I must do something to output from this on?

The same thing happens when i tried to implement on my own


Thnaks in advance


Re: Reuters Example LDA Error (no help anywhere)

2014-03-06 Thread Suneel Marthi
The script needs to be corrected to not call vectordump for LDA as vectordump 
utility (or even clusterdump) are presently not capable of displaying topics 
and relevant documents. I recall this issue was previously reported by Peyman 
Faratin post 0.9 release. 

Ideally Mahout's missing a clusterdump utility for that reads in LDA topics, 
Document - DocumentId mapping and displays a report of the topics and the 
documents that belong to a topic.

Meanwhile in order to see the generated topics and documents please refer to 
this blog: 
http://sujitpal.blogspot.com/2013/10/topic-modeling-with-mahout-on-amazon-emr.html

Let me file a JIRA for this.





On Thursday, March 6, 2014 6:12 PM, Cosmin Dumbrava  
wrote:
 
I don't know if is ok to mail on this address like this but... there is

I have executed cluster-reuters.sh from example directory (vers 1.0
SNAPSHOT) and at the end i only get a list of
.
21575   
 {0.02:0.6314297270431626,0.03:
0.12547216143460152,0.007050:0.0806108337305,0.04:0.07121802301642256,0.025:0.0677648308012434,0.003:0.0221466872297289,0.06:4.4720109631453837E-4,0.01:4.0331445050718065E-4,0.077:1.0509017796402916E-4,0.1:6.868649426131684E-5}
21576
{0.055:0.7123345754234253,0.003:0.10345316403842542,0.025:0.07850931669910466,0.1:0.0688641506163345,0.06:0.010599081492449824,0:0.0081953368778766,0.04:0.00469907695241742,0.03:0.003966985061879055,0.07:0.002197060890631658,0.0625:0.0020741956232281466}
21577
{0.04:0.5277733526037044,0.01:0.46656672162804314,0.07:0.0024295914763474164,0.1:0.002243674469679058,0.077:8.012577174900807E-4,0.007050:3.9184997476998896E-5,0.03:3.2141106779800255E-5,0.0625:2.4665616652494003E-5,0.02:1.949377177063371E-5,0.025:1.3329985998932362E-5}


$MAHOUT cvb \
    -i ${WORK_DIR}/reuters-out-matrix/matrix \
    -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20
 \
    -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
    -dt ${WORK_DIR}/reuters-lda-topics \
    -mt ${WORK_DIR}/reuters-lda-model \
  && \
  $MAHOUT vectordump \
    -i ${WORK_DIR}/reuters-lda-topics/part-m-0 \
    -o ${WORK_DIR}/reuters-lda/vectordump \
    -vs 10 -p true \
    -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
    -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-0 \
    && \

I must do something to output from this on?

The same thing happens when i tried to implement on my own


Thnaks in advance

Re: Reuters Example LDA Error (no help anywhere)

2014-03-06 Thread Suneel Marthi
Typo in previous email, read as:

"Ideally Mahout's missing a clusterdump like utility for that reads in LDA 
topics, Document - DocumentId mapping and displays a report of the 
topics and the documents that belong to a topic."




On Thursday, March 6, 2014 7:06 PM, Suneel Marthi  
wrote:
 
The script needs to be corrected to not call vectordump for LDA as vectordump 
utility (or even clusterdump) are presently not capable of displaying topics 
and relevant documents. I recall this issue was previously reported by Peyman 
Faratin post 0.9 release. 

Ideally Mahout's missing a clusterdump utility for that reads in LDA topics, 
Document - DocumentId mapping and displays a report of the topics and the 
documents that belong to a topic.

Meanwhile in order to see the generated topics and documents please refer to 
this blog: 
http://sujitpal.blogspot.com/2013/10/topic-modeling-with-mahout-on-amazon-emr.html

Let me file a JIRA for this.






On Thursday, March 6, 2014 6:12 PM, Cosmin Dumbrava  
wrote:

I don't know if is ok to mail on this address like this but... there is

I have executed cluster-reuters.sh from example directory (vers 1.0
SNAPSHOT) and at the end i only get a list of
.
21575   
{0.02:0.6314297270431626,0.03:
0.12547216143460152,0.007050:0.0806108337305,0.04:0.07121802301642256,0.025:0.0677648308012434,0.003:0.0221466872297289,0.06:4.4720109631453837E-4,0.01:4.0331445050718065E-4,0.077:1.0509017796402916E-4,0.1:6.868649426131684E-5}
21576
{0.055:0.7123345754234253,0.003:0.10345316403842542,0.025:0.07850931669910466,0.1:0.0688641506163345,0.06:0.010599081492449824,0:0.0081953368778766,0.04:0.00469907695241742,0.03:0.003966985061879055,0.07:0.002197060890631658,0.0625:0.0020741956232281466}
21577
{0.04:0.5277733526037044,0.01:0.46656672162804314,0.07:0.0024295914763474164,0.1:0.002243674469679058,0.077:8.012577174900807E-4,0.007050:3.9184997476998896E-5,0.03:3.2141106779800255E-5,0.0625:2.4665616652494003E-5,0.02:1.949377177063371E-5,0.025:1.3329985998932362E-5}


$MAHOUT cvb \
    -i ${WORK_DIR}/reuters-out-matrix/matrix \
    -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20
\
    -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
    -dt ${WORK_DIR}/reuters-lda-topics \
    -mt ${WORK_DIR}/reuters-lda-model \
  && \
  $MAHOUT vectordump \
    -i ${WORK_DIR}/reuters-lda-topics/part-m-0 \
    -o ${WORK_DIR}/reuters-lda/vectordump \
    -vs 10 -p true \
    -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
    -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-0 \
    && \

I must do something to output from this on?

The same thing happens when i tried to implement on my own


Thnaks in advance

Re: Reuters Example LDA Error (no help anywhere)

2014-03-06 Thread Cosinus WebDev
Hi,

Thank you for the answer, now I can rest a second :)

Hope this will be fixed soon. If you file a JIRA please send me the link so
I can watch the result.

Thank you again,

And one more question or two
1. vectordumping the cvb result(/work/out/cvb) is terms in topic
2. inside topics directory(/work/out/topics) should be the "best" terms
from all topics ???

bin/mahout cvb \
-i /work/matrix \
-o /work/out/cvb -k 100 -ow -x 20 \
-dt /work/out/topics \





On Fri, Mar 7, 2014 at 2:07 AM, Suneel Marthi wrote:

> Typo in previous email, read as:
>
> "Ideally Mahout's missing a clusterdump like utility for that reads in LDA
> topics, Document - DocumentId mapping and displays a report of the
> topics and the documents that belong to a topic."
>
>
>
>
> On Thursday, March 6, 2014 7:06 PM, Suneel Marthi 
> wrote:
>
> The script needs to be corrected to not call vectordump for LDA as
> vectordump utility (or even clusterdump) are presently not capable of
> displaying topics and relevant documents. I recall this issue was
> previously reported by Peyman Faratin post 0.9 release.
>
> Ideally Mahout's missing a clusterdump utility for that reads in LDA
> topics, Document - DocumentId mapping and displays a report of the topics
> and the documents that belong to a topic.
>
> Meanwhile in order to see the generated topics and documents please refer
> to this blog:
> http://sujitpal.blogspot.com/2013/10/topic-modeling-with-mahout-on-amazon-emr.html
>
> Let me file a JIRA for this.
>
>
>
>
>
>
> On Thursday, March 6, 2014 6:12 PM, Cosmin Dumbrava <
> officeweb...@gmail.com> wrote:
>
> I don't know if is ok to mail on this address like this but... there is
>
> I have executed cluster-reuters.sh from example directory (vers 1.0
> SNAPSHOT) and at the end i only get a list of
> .
> 21575
> {0.02:0.6314297270431626,0.03:
>
> 0.12547216143460152,0.007050:0.0806108337305,0.04:0.07121802301642256,0.025:0.0677648308012434,0.003:0.0221466872297289,0.06:4.4720109631453837E-4,0.01:4.0331445050718065E-4,0.077:1.0509017796402916E-4,0.1:6.868649426131684E-5}
> 21576
>
> {0.055:0.7123345754234253,0.003:0.10345316403842542,0.025:0.07850931669910466,0.1:0.0688641506163345,0.06:0.010599081492449824,0:0.0081953368778766,0.04:0.00469907695241742,0.03:0.003966985061879055,0.07:0.002197060890631658,0.0625:0.0020741956232281466}
> 21577
>
> {0.04:0.5277733526037044,0.01:0.46656672162804314,0.07:0.0024295914763474164,0.1:0.002243674469679058,0.077:8.012577174900807E-4,0.007050:3.9184997476998896E-5,0.03:3.2141106779800255E-5,0.0625:2.4665616652494003E-5,0.02:1.949377177063371E-5,0.025:1.3329985998932362E-5}
> 
>
> $MAHOUT cvb \
> -i ${WORK_DIR}/reuters-out-matrix/matrix \
> -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20
> \
> -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
> -dt ${WORK_DIR}/reuters-lda-topics \
> -mt ${WORK_DIR}/reuters-lda-model \
>   && \
>   $MAHOUT vectordump \
> -i ${WORK_DIR}/reuters-lda-topics/part-m-0 \
> -o ${WORK_DIR}/reuters-lda/vectordump \
> -vs 10 -p true \
> -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
> -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-0 \
> && \
>
> I must do something to output from this on?
>
> The same thing happens when i tried to implement on my own
>
>
> Thnaks in advance
>


[blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-06 Thread Koji Sekiguchi
Hello,

I just posted an article on Comparing Document Classification Functions
of Lucene and Mahout.

http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

Comments are welcome. :)

Thanks!

koji
-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html