[jira] [Commented] (MAHOUT-1539) Implement affinity matrix computation in Mahout DSL

2015-03-30 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387993#comment-14387993
 ] 

Andrew Musselman commented on MAHOUT-1539:
--

I'd say start with something commonly used, like vectors.

Please make a pull request as soon as you can so we can look at actual code 
rather than just concepts, then develop from there.

> Implement affinity matrix computation in Mahout DSL
> ---
>
> Key: MAHOUT-1539
> URL: https://issues.apache.org/jira/browse/MAHOUT-1539
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Shannon Quinn
>Assignee: Shannon Quinn
>  Labels: DSL, scala, spark
> Fix For: 0.10.1
>
> Attachments: ComputeAffinities.scala
>
>
> This has the same goal as MAHOUT-1506, but rather than code the pairwise 
> computations in MapReduce, this will be done in the Mahout DSL.
> An orthogonal issue is the format of the raw input (vectors, text, images, 
> SequenceFiles), and how the user specifies the distance equation and any 
> associated parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1539) Implement affinity matrix computation in Mahout DSL

2015-03-30 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387984#comment-14387984
 ] 

Saikat Kanjilal commented on MAHOUT-1539:
-

So I did some more research and have some questions, I have added questions to 
JIRA as well:

1) Are we going to deal with images or text data to start?
2) What do we really mean by data point, in my mind its represented by a  (x,y)
3) I think the similarity measure associated with determining locality 
sensitive hashing should be configurable, namely we should be able to plug in 
Jacard/Euclidean or Cosine similarities as functions to be computed

I have a sample localitysensitivehashing scheme coded up in scala but want to 
get further clarifications on the above before I proceed further

Thanks for your help

> Implement affinity matrix computation in Mahout DSL
> ---
>
> Key: MAHOUT-1539
> URL: https://issues.apache.org/jira/browse/MAHOUT-1539
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Shannon Quinn
>Assignee: Shannon Quinn
>  Labels: DSL, scala, spark
> Fix For: 0.10.1
>
> Attachments: ComputeAffinities.scala
>
>
> This has the same goal as MAHOUT-1506, but rather than code the pairwise 
> computations in MapReduce, this will be done in the Mahout DSL.
> An orthogonal issue is the format of the raw input (vectors, text, images, 
> SequenceFiles), and how the user specifies the distance equation and any 
> associated parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MAHOUT-1539) Implement affinity matrix computation in Mahout DSL

2015-03-30 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387984#comment-14387984
 ] 

Saikat Kanjilal edited comment on MAHOUT-1539 at 3/31/15 4:47 AM:
--

So I did some more research and have some questions:

1) Are we going to deal with images or text data to start?
2) What do we really mean by data point, in my mind its represented by a  (x,y)
3) I think the similarity measure associated with determining locality 
sensitive hashing should be configurable, namely we should be able to plug in 
Jacard/Euclidean or Cosine similarities as functions to be computed

I have a sample localitysensitivehashing scheme coded up in scala but want to 
get further clarifications on the above before I proceed further

Thanks for your help


was (Author: kanjilal):
So I did some more research and have some questions, I have added questions to 
JIRA as well:

1) Are we going to deal with images or text data to start?
2) What do we really mean by data point, in my mind its represented by a  (x,y)
3) I think the similarity measure associated with determining locality 
sensitive hashing should be configurable, namely we should be able to plug in 
Jacard/Euclidean or Cosine similarities as functions to be computed

I have a sample localitysensitivehashing scheme coded up in scala but want to 
get further clarifications on the above before I proceed further

Thanks for your help

> Implement affinity matrix computation in Mahout DSL
> ---
>
> Key: MAHOUT-1539
> URL: https://issues.apache.org/jira/browse/MAHOUT-1539
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Shannon Quinn
>Assignee: Shannon Quinn
>  Labels: DSL, scala, spark
> Fix For: 0.10.1
>
> Attachments: ComputeAffinities.scala
>
>
> This has the same goal as MAHOUT-1506, but rather than code the pairwise 
> computations in MapReduce, this will be done in the Mahout DSL.
> An orthogonal issue is the format of the raw input (vectors, text, images, 
> SequenceFiles), and how the user specifies the distance equation and any 
> associated parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1602) Euclidean Distance Similarity Math

2015-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387949#comment-14387949
 ] 

Hudson commented on MAHOUT-1602:


SUCCESS: Integrated in Mahout-Quality #3032 (See 
[https://builds.apache.org/job/Mahout-Quality/3032/])
MAHOUT-1602: Euclidean Distance Similarity Math, this also closes #60 
(suneel.marthi: rev 57429b176a94514acce5f6985a1299fccb8d115f)
* 
mrlegacy/src/main/java/org/apache/mahout/cf/taste/impl/similarity/EuclideanDistanceSimilarity.java
* CHANGELOG


> Euclidean Distance Similarity Math 
> ---
>
> Key: MAHOUT-1602
> URL: https://issues.apache.org/jira/browse/MAHOUT-1602
> Project: Mahout
>  Issue Type: Bug
>  Components: Collaborative Filtering, Math
>Affects Versions: 0.9
>Reporter: Leonardo Fernandez Sanchez
>Assignee: Stevo Slavic
>Priority: Minor
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Within the file:
> /mrlegacy/src/main/java/org/apache/mahout/cf/taste/impl/similarity/EuclideanDistanceSimilarity.java
> Mentions that the implementation should be sqrt(n) / (1 + distance).
> Once the equation is simplified, should be: 
> 1 / ((1 + distance) / sqrt(n))
> Coded:
> return 1.0 / ((1.0 + Math.sqrt(sumXYdiff2)) / Math.sqrt(n));
> But instead is (missing grouping brackets): 
> 1 / (1 + distance / sqrt (n))
> Coded:
> return 1.0 / (1.0 + Math.sqrt(sumXYdiff2) / Math.sqrt(n));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1602) Euclidean Distance Similarity Math

2015-03-30 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1602.
---
Resolution: Fixed

> Euclidean Distance Similarity Math 
> ---
>
> Key: MAHOUT-1602
> URL: https://issues.apache.org/jira/browse/MAHOUT-1602
> Project: Mahout
>  Issue Type: Bug
>  Components: Collaborative Filtering, Math
>Affects Versions: 0.9
>Reporter: Leonardo Fernandez Sanchez
>Assignee: Stevo Slavic
>Priority: Minor
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Within the file:
> /mrlegacy/src/main/java/org/apache/mahout/cf/taste/impl/similarity/EuclideanDistanceSimilarity.java
> Mentions that the implementation should be sqrt(n) / (1 + distance).
> Once the equation is simplified, should be: 
> 1 / ((1 + distance) / sqrt(n))
> Coded:
> return 1.0 / ((1.0 + Math.sqrt(sumXYdiff2)) / Math.sqrt(n));
> But instead is (missing grouping brackets): 
> 1 / (1 + distance / sqrt (n))
> Coded:
> return 1.0 / (1.0 + Math.sqrt(sumXYdiff2) / Math.sqrt(n));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1619) HighDFWordsPruner overwrites cache files

2015-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387910#comment-14387910
 ] 

Hudson commented on MAHOUT-1619:


SUCCESS: Integrated in Mahout-Quality #3031 (See 
[https://builds.apache.org/job/Mahout-Quality/3031/])
MAHOUT-1619: HighDFWordsPruner overwrites cache files, this fixes #57 
(suneel.marthi: rev 5624a96c6da94ad1c95077dbbdab71409d85770c)
* mrlegacy/src/main/java/org/apache/mahout/vectorizer/HighDFWordsPruner.java
* 
mrlegacy/src/main/java/org/apache/mahout/vectorizer/collocations/llr/CollocMapper.java
* CHANGELOG


> HighDFWordsPruner overwrites cache files
> 
>
> Key: MAHOUT-1619
> URL: https://issues.apache.org/jira/browse/MAHOUT-1619
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
>Reporter: Burke Webster
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: legacy
> Fix For: 0.10.0
>
>
> HighDFWordsPruner uses DistributedCache.setCacheFiles which will overwrite 
> any files already in the cache.  Per the fix in MAHOUT-1498 we should be 
> using addCacheFile, which will not overwrite existing cache files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1602) Euclidean Distance Similarity Math

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387889#comment-14387889
 ] 

ASF GitHub Bot commented on MAHOUT-1602:


Github user smarthi commented on the pull request:

https://github.com/apache/mahout/pull/60#issuecomment-87917988
  
This doesn't seem correct, there is a correction that needs to be done in 
the code comments which is to be addressed by MAHOUT-1602. Will close this PR 
along with that Jira.


> Euclidean Distance Similarity Math 
> ---
>
> Key: MAHOUT-1602
> URL: https://issues.apache.org/jira/browse/MAHOUT-1602
> Project: Mahout
>  Issue Type: Bug
>  Components: Collaborative Filtering, Math
>Affects Versions: 0.9
>Reporter: Leonardo Fernandez Sanchez
>Assignee: Stevo Slavic
>Priority: Minor
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Within the file:
> /mrlegacy/src/main/java/org/apache/mahout/cf/taste/impl/similarity/EuclideanDistanceSimilarity.java
> Mentions that the implementation should be sqrt(n) / (1 + distance).
> Once the equation is simplified, should be: 
> 1 / ((1 + distance) / sqrt(n))
> Coded:
> return 1.0 / ((1.0 + Math.sqrt(sumXYdiff2)) / Math.sqrt(n));
> But instead is (missing grouping brackets): 
> 1 / (1 + distance / sqrt (n))
> Coded:
> return 1.0 / (1.0 + Math.sqrt(sumXYdiff2) / Math.sqrt(n));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1619) HighDFWordsPruner overwrites cache files

2015-03-30 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1619.
---
Resolution: Fixed

> HighDFWordsPruner overwrites cache files
> 
>
> Key: MAHOUT-1619
> URL: https://issues.apache.org/jira/browse/MAHOUT-1619
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
>Reporter: Burke Webster
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: legacy
> Fix For: 0.10.0
>
>
> HighDFWordsPruner uses DistributedCache.setCacheFiles which will overwrite 
> any files already in the cache.  Per the fix in MAHOUT-1498 we should be 
> using addCacheFile, which will not overwrite existing cache files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1619) HighDFWordsPruner overwrites cache files

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387880#comment-14387880
 ] 

ASF GitHub Bot commented on MAHOUT-1619:


Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/57


> HighDFWordsPruner overwrites cache files
> 
>
> Key: MAHOUT-1619
> URL: https://issues.apache.org/jira/browse/MAHOUT-1619
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
>Reporter: Burke Webster
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: legacy
> Fix For: 0.10.0
>
>
> HighDFWordsPruner uses DistributedCache.setCacheFiles which will overwrite 
> any files already in the cache.  Per the fix in MAHOUT-1498 we should be 
> using addCacheFile, which will not overwrite existing cache files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MAHOUT-1522) Handle logging levels via log4j.xml

2015-03-30 Thread Andrew Musselman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Musselman updated MAHOUT-1522:
-
Comment: was deleted

(was: [~dlyubimov] Does the existing WARN level actually work for Spark stuff?)

> Handle logging levels via log4j.xml
> ---
>
> Key: MAHOUT-1522
> URL: https://issues.apache.org/jira/browse/MAHOUT-1522
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Critical
>  Labels: legacy, scala
> Fix For: 0.10.0
>
>
> We don't have a properties file to tell log4j what to do, so we inherit other 
> frameworks' settings.
> Suggestion is to add a log4j.xml file in a canonical place and set up logging 
> levels, maybe separating out components for ease of setting levels during 
> debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1522) Handle logging levels via log4j.xml

2015-03-30 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387685#comment-14387685
 ] 

Andrew Musselman commented on MAHOUT-1522:
--

[~dlyubimov] Does the existing WARN level actually work for Spark stuff?

> Handle logging levels via log4j.xml
> ---
>
> Key: MAHOUT-1522
> URL: https://issues.apache.org/jira/browse/MAHOUT-1522
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Critical
>  Labels: legacy, scala
> Fix For: 0.10.0
>
>
> We don't have a properties file to tell log4j what to do, so we inherit other 
> frameworks' settings.
> Suggestion is to add a log4j.xml file in a canonical place and set up logging 
> levels, maybe separating out components for ease of setting levels during 
> debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: MapR repo might need to be updated

2015-03-30 Thread Ted Dunning
(moving dev@mahout to bcc since this is not of widespread interest)

Stevo,

Here is what our builds guy says:

Our version of nexus is 2.3.1.  The last update to the repo was Friday.
> Because the error listed a cookie issue, I restarted apache. I have two
> builds building right now and pulling from the repo, no issues, yet.


Can you say if the problem persists?



On Mon, Mar 30, 2015 at 2:34 PM, Stevo Slavić  wrote:

> Hello Ted,
>
> MapR Maven repository manager, seems to be Nexus, and it seems to be
> version 2.11.1 or older with this bug still in it:
> https://issues.sonatype.org/browse/NEXUS-7877
>
> Mahout build uses MapR Maven repository, and for all artifacts/dependencies
> resolved from it, build output is polluted with warnings like:
>
>
> Downloading:
> http://repository.mapr.com/maven/org/apache/apache/16/apache-16.pom
> Mar 30, 2015 11:20:48 PM
>
> org.apache.maven.wagon.providers.http.httpclient.client.protocol.ResponseProcessCookies
> processCookies
> WARNING: Cookie rejected [rememberMe="deleteMe", version:0, domain:
> repository.mapr.com, path:/nexus, expiry:Mon Mar 30 23:20:48 CEST 2015]
> Illegal path attribute "/nexus". Path of origin:
> "/maven/org/apache/apache/16/apache-16.pom"
>
>
> Please consider having it updated.
>
> Kind regards,
> Stevo Slavic.
>


Re: Anyone using eclipse?

2015-03-30 Thread Ted Dunning
Idea here as well.



On Mon, Mar 30, 2015 at 4:52 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Idea here
>
> On Mon, Mar 30, 2015 at 4:42 PM, Andrew Palumbo 
> wrote:
>
> > also using idea
> >
> >
> > On 03/30/2015 07:18 PM, Dmitriy Lyubimov wrote:
> >
> >> I switched to idea since i started doing mixed projects with scala.
> >> Standalone scala is bearable in eclipse but mixed projects simply don't
> >> work. (and Mahout likely one of them).
> >>
> >> On Mon, Mar 30, 2015 at 3:58 PM, Suneel Marthi  >
> >> wrote:
> >>
> >>  I believe its only Shannon from amongst the committer team who is using
> >>> Eclipse. I am talking him out into shifting to IntelliJ.
> >>>
> >>> On Mon, Mar 30, 2015 at 6:54 PM, Stevo Slavić 
> wrote:
> >>>
> >>>  Hello team,
> 
>  I'm curious, is anyone of you using eclipse IDE?
>  If not, then as part of MAHOUT-1278 I could remove a lot from our
> POMs.
> 
>  Kind regards,
>  Stevo Slavic.
> 
> 
> >
>


Re: MapR repo might need to be updated

2015-03-30 Thread Ted Dunning
Thanks.

On it.

On Mon, Mar 30, 2015 at 2:34 PM, Stevo Slavić  wrote:

> Hello Ted,
>
> MapR Maven repository manager, seems to be Nexus, and it seems to be
> version 2.11.1 or older with this bug still in it:
> https://issues.sonatype.org/browse/NEXUS-7877
>
> Mahout build uses MapR Maven repository, and for all artifacts/dependencies
> resolved from it, build output is polluted with warnings like:
>
>
> Downloading:
> http://repository.mapr.com/maven/org/apache/apache/16/apache-16.pom
> Mar 30, 2015 11:20:48 PM
>
> org.apache.maven.wagon.providers.http.httpclient.client.protocol.ResponseProcessCookies
> processCookies
> WARNING: Cookie rejected [rememberMe="deleteMe", version:0, domain:
> repository.mapr.com, path:/nexus, expiry:Mon Mar 30 23:20:48 CEST 2015]
> Illegal path attribute "/nexus". Path of origin:
> "/maven/org/apache/apache/16/apache-16.pom"
>
>
> Please consider having it updated.
>
> Kind regards,
> Stevo Slavic.
>


Re: Anyone using eclipse?

2015-03-30 Thread Andrew Musselman
Idea here

On Mon, Mar 30, 2015 at 4:42 PM, Andrew Palumbo  wrote:

> also using idea
>
>
> On 03/30/2015 07:18 PM, Dmitriy Lyubimov wrote:
>
>> I switched to idea since i started doing mixed projects with scala.
>> Standalone scala is bearable in eclipse but mixed projects simply don't
>> work. (and Mahout likely one of them).
>>
>> On Mon, Mar 30, 2015 at 3:58 PM, Suneel Marthi 
>> wrote:
>>
>>  I believe its only Shannon from amongst the committer team who is using
>>> Eclipse. I am talking him out into shifting to IntelliJ.
>>>
>>> On Mon, Mar 30, 2015 at 6:54 PM, Stevo Slavić  wrote:
>>>
>>>  Hello team,

 I'm curious, is anyone of you using eclipse IDE?
 If not, then as part of MAHOUT-1278 I could remove a lot from our POMs.

 Kind regards,
 Stevo Slavic.


>


Re: Anyone using eclipse?

2015-03-30 Thread Andrew Palumbo

also using idea

On 03/30/2015 07:18 PM, Dmitriy Lyubimov wrote:

I switched to idea since i started doing mixed projects with scala.
Standalone scala is bearable in eclipse but mixed projects simply don't
work. (and Mahout likely one of them).

On Mon, Mar 30, 2015 at 3:58 PM, Suneel Marthi 
wrote:


I believe its only Shannon from amongst the committer team who is using
Eclipse. I am talking him out into shifting to IntelliJ.

On Mon, Mar 30, 2015 at 6:54 PM, Stevo Slavić  wrote:


Hello team,

I'm curious, is anyone of you using eclipse IDE?
If not, then as part of MAHOUT-1278 I could remove a lot from our POMs.

Kind regards,
Stevo Slavic.





Re: Anyone using eclipse?

2015-03-30 Thread Shannon Quinn
Unsuccessfully thus far, but yes I'm on eclipse. 

iPhone'd

> On Mar 30, 2015, at 18:58, Suneel Marthi  wrote:
> 
> I believe its only Shannon from amongst the committer team who is using
> Eclipse. I am talking him out into shifting to IntelliJ.
> 
>> On Mon, Mar 30, 2015 at 6:54 PM, Stevo Slavić  wrote:
>> 
>> Hello team,
>> 
>> I'm curious, is anyone of you using eclipse IDE?
>> If not, then as part of MAHOUT-1278 I could remove a lot from our POMs.
>> 
>> Kind regards,
>> Stevo Slavic.
>> 


[jira] [Commented] (MAHOUT-1661) Deprecate Lanczos in the code base

2015-03-30 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387579#comment-14387579
 ] 

Suneel Marthi commented on MAHOUT-1661:
---

After talking to Dmitriy, we agreed that we can leave Lanczos in the codebase 
as long as its marked 'deprecated'. Shannon you can mark this 'Resolved' if the 
code changes have been committed to master. Thanks.

> Deprecate Lanczos in the code base
> --
>
> Key: MAHOUT-1661
> URL: https://issues.apache.org/jira/browse/MAHOUT-1661
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Suneel Marthi
>Assignee: Shannon Quinn
>Priority: Critical
> Fix For: 0.10.0
>
>
> Lanczos has long been deprecated from the code base but the code doesn't 
> reflect that.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Anyone using eclipse?

2015-03-30 Thread Dmitriy Lyubimov
I switched to idea since i started doing mixed projects with scala.
Standalone scala is bearable in eclipse but mixed projects simply don't
work. (and Mahout likely one of them).

On Mon, Mar 30, 2015 at 3:58 PM, Suneel Marthi 
wrote:

> I believe its only Shannon from amongst the committer team who is using
> Eclipse. I am talking him out into shifting to IntelliJ.
>
> On Mon, Mar 30, 2015 at 6:54 PM, Stevo Slavić  wrote:
>
> > Hello team,
> >
> > I'm curious, is anyone of you using eclipse IDE?
> > If not, then as part of MAHOUT-1278 I could remove a lot from our POMs.
> >
> > Kind regards,
> > Stevo Slavic.
> >
>


[jira] [Updated] (MAHOUT-1661) Deprecate Lanczos in the code base

2015-03-30 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1661:
--
Description: Lanczos has long been deprecated from the code base but the 
code doesn't reflect that.(was: Lanczos has long been deprecated from the 
code base but the code doesn't reflect that.  Now that Spectral KMeans has been 
refactored to use SSVD, Lanczos can be purged.
)

> Deprecate Lanczos in the code base
> --
>
> Key: MAHOUT-1661
> URL: https://issues.apache.org/jira/browse/MAHOUT-1661
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Suneel Marthi
>Assignee: Shannon Quinn
>Priority: Critical
> Fix For: 0.10.0
>
>
> Lanczos has long been deprecated from the code base but the code doesn't 
> reflect that.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1661) Deprecate Lanczos in the code base

2015-03-30 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1661:
--
Summary: Deprecate Lanczos in the code base  (was: Remove Lanczos from the 
code base)

> Deprecate Lanczos in the code base
> --
>
> Key: MAHOUT-1661
> URL: https://issues.apache.org/jira/browse/MAHOUT-1661
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Suneel Marthi
>Assignee: Shannon Quinn
>Priority: Critical
> Fix For: 0.10.0
>
>
> Lanczos has long been deprecated from the code base but the code doesn't 
> reflect that.  Now that Spectral KMeans has been refactored to use SSVD, 
> Lanczos can be purged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Anyone using eclipse?

2015-03-30 Thread Suneel Marthi
I believe its only Shannon from amongst the committer team who is using
Eclipse. I am talking him out into shifting to IntelliJ.

On Mon, Mar 30, 2015 at 6:54 PM, Stevo Slavić  wrote:

> Hello team,
>
> I'm curious, is anyone of you using eclipse IDE?
> If not, then as part of MAHOUT-1278 I could remove a lot from our POMs.
>
> Kind regards,
> Stevo Slavic.
>


Anyone using eclipse?

2015-03-30 Thread Stevo Slavić
Hello team,

I'm curious, is anyone of you using eclipse IDE?
If not, then as part of MAHOUT-1278 I could remove a lot from our POMs.

Kind regards,
Stevo Slavic.


Re: [jira] [Work started] (MAHOUT-1522) Handle logging levels via log4j.xml

2015-03-30 Thread Andrew Palumbo

cool- i'd thought this was marked for 0.10.1

On 03/30/2015 05:29 PM, Andrew Musselman (JIRA) wrote:

  [ 
https://issues.apache.org/jira/browse/MAHOUT-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1522 started by Andrew Musselman.


Handle logging levels via log4j.xml
---

 Key: MAHOUT-1522
 URL: https://issues.apache.org/jira/browse/MAHOUT-1522
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Andrew Musselman
Assignee: Andrew Musselman
Priority: Critical
  Labels: legacy, scala
 Fix For: 0.10.0


We don't have a properties file to tell log4j what to do, so we inherit other 
frameworks' settings.
Suggestion is to add a log4j.xml file in a canonical place and set up logging 
levels, maybe separating out components for ease of setting levels during 
debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)




MapR repo might need to be updated

2015-03-30 Thread Stevo Slavić
Hello Ted,

MapR Maven repository manager, seems to be Nexus, and it seems to be
version 2.11.1 or older with this bug still in it:
https://issues.sonatype.org/browse/NEXUS-7877

Mahout build uses MapR Maven repository, and for all artifacts/dependencies
resolved from it, build output is polluted with warnings like:


Downloading:
http://repository.mapr.com/maven/org/apache/apache/16/apache-16.pom
Mar 30, 2015 11:20:48 PM
org.apache.maven.wagon.providers.http.httpclient.client.protocol.ResponseProcessCookies
processCookies
WARNING: Cookie rejected [rememberMe="deleteMe", version:0, domain:
repository.mapr.com, path:/nexus, expiry:Mon Mar 30 23:20:48 CEST 2015]
Illegal path attribute "/nexus". Path of origin:
"/maven/org/apache/apache/16/apache-16.pom"


Please consider having it updated.

Kind regards,
Stevo Slavic.


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387447#comment-14387447
 ] 

Andrew Musselman commented on MAHOUT-1655:
--

Need any help?

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1522) Handle logging levels via log4j.xml

2015-03-30 Thread Andrew Musselman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1522 started by Andrew Musselman.

> Handle logging levels via log4j.xml
> ---
>
> Key: MAHOUT-1522
> URL: https://issues.apache.org/jira/browse/MAHOUT-1522
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Critical
>  Labels: legacy, scala
> Fix For: 0.10.0
>
>
> We don't have a properties file to tell log4j what to do, so we inherit other 
> frameworks' settings.
> Suggestion is to add a log4j.xml file in a canonical place and set up logging 
> levels, maybe separating out components for ease of setting levels during 
> debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1278) Improve inheritance of Apache parent pom

2015-03-30 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1278 started by Stevo Slavic.

> Improve inheritance of Apache parent pom
> 
>
> Key: MAHOUT-1278
> URL: https://issues.apache.org/jira/browse/MAHOUT-1278
> Project: Mahout
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.8
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
>  Labels: legacy, scala
> Fix For: 0.10.0
>
>
> We should update dependency on Apache parent pom (currently we depend on 
> version 9, while 13 is already released).
> With the upgrade we should make the most of inherited settings and plugin 
> versions from Apache parent pom, so we override only what is necessary, to 
> make Mahout POMs smaller and easier to maintain.
> Hopefully by the time this issue gets worked on, 
> maven-remote-resources-plugin with 
> [MRRESOURCES-53|http://jira.codehaus.org/browse/MRRESOURCES-53] fix will be 
> released (since we're affected by it - test jars are being resolved from 
> remote repository instead from the current build / rector repository), and 
> updated Apache parent pom released.
> Implementation note: Mahout parent module and mahout-buildtools module both 
> use Apache parent pom as parent, so both need to be updated. 
> mahout-buildtools module had to be separate from the mahout parent pom (not 
> inheriting it), so that buildtools module can be referenced as dependency of 
> various source quality check plugins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


MAHOUT_LOCAL Logging

2015-03-30 Thread Andrew Palumbo
When MAHOUT_LOCAL is set the confusion matrix is not printing in the 
command line classify-20newsgroups.sh example.


the original log level was info:

I raised it to log.warn:
log.warn("{} Results: {}", hasOption("testComplementary") ? 
"Complementary" : "Standard NB", analyzer);


But it still isn't showing up.


Any ideas?


[jira] [Commented] (MAHOUT-1462) Cleaning up Random Forests documentation on Mahout website

2015-03-30 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387415#comment-14387415
 ] 

Andrew Musselman commented on MAHOUT-1462:
--

Any chance you could convert that Google doc to markdown, and suggest how to 
merge with https://mahout.apache.org/users/classification/breiman-example.html ?

> Cleaning up Random Forests documentation on Mahout website
> --
>
> Key: MAHOUT-1462
> URL: https://issues.apache.org/jira/browse/MAHOUT-1462
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Manoj Awasthi
>Assignee: Andrew Musselman
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Following are the items which need be added or changed. 
> I think this page can be broken into two segments. First can be following: 
> 
> Introduction to Random Forests
> Random Forests are an ensemble machine learning technique originally proposed 
> by Leo Breiman (UCB) which uses classification and regression trees as 
> underlying classification mechanism. Trademark to Random Forest is maintained 
> by Leo Breiman and Adele Cutler. 
> Official website for Random Forests: 
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
> Original paper published: http://oz.berkeley.edu/~breiman/randomforest2001.pdf
> 
> Second section can following: 
> 
> Classifying with random forests with Mahout
> 
> This section can be what it is right now on the website.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1559) Add documentation for and clean up the wikipedia classifier example

2015-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387410#comment-14387410
 ] 

Hudson commented on MAHOUT-1559:


SUCCESS: Integrated in Mahout-Quality #3030 (See 
[https://builds.apache.org/job/Mahout-Quality/3030/])
MAHOUT-1559: Clean up wikipedia classifier example closes apache/mahout#90 
(apalumbo: rev d5d8de1857d60b7b53b9baf0af6e7aea26bbde19)
* examples/bin/classify-wiki.sh
* examples/bin/classify-wikipedia.sh
* CHANGELOG


> Add documentation for and clean up the wikipedia classifier example
> ---
>
> Key: MAHOUT-1559
> URL: https://issues.apache.org/jira/browse/MAHOUT-1559
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
>Priority: Minor
>  Labels: DSL, legacy, scala
> Fix For: 0.10.0
>
>
> Add documentation for the wikipedia classifer example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1516) run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.

2015-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387409#comment-14387409
 ] 

Hudson commented on MAHOUT-1516:


SUCCESS: Integrated in Mahout-Quality #3030 (See 
[https://builds.apache.org/job/Mahout-Quality/3030/])
MAHOUT-1516: classify-20newsgroups.sh failed: /tmp/mahout-work-jpan/20news-all 
does not exists in hdfs. (apalumbo: rev 
edec611f07d4e7a352a3332af271207652caab02)
* CHANGELOG
* examples/bin/classify-20newsgroups.sh


> run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all 
> does not exists in hdfs.
> --
>
> Key: MAHOUT-1516
> URL: https://issues.apache.org/jira/browse/MAHOUT-1516
> Project: Mahout
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.9
> Environment: hadoop2.2.0 mahout0.9 ubuntu12.04 
>Reporter: Jian Pan
>Assignee: Andrew Palumbo
>Priority: Minor
>  Labels: legacy, patch
> Fix For: 0.10.0
>
>
> + echo 'Copying 20newsgroups data to HDFS'
> Copying 20newsgroups data to HDFS
> + set +e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -rmr 
> /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> rmr: DEPRECATED: Please use 'rm -r' instead.
> 14/04/17 10:26:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rmr: `/tmp/mahout-work-jpan/20news-all': No such file or directory
> + set -e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -put 
> /tmp/mahout-work-jpan/20news-all /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> 14/04/17 10:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> put: `/tmp/mahout-work-jpan/20news-all': No such file or directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1462) Cleaning up Random Forests documentation on Mahout website

2015-03-30 Thread Andrew Musselman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1462 started by Andrew Musselman.

> Cleaning up Random Forests documentation on Mahout website
> --
>
> Key: MAHOUT-1462
> URL: https://issues.apache.org/jira/browse/MAHOUT-1462
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Manoj Awasthi
>Assignee: Andrew Musselman
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Following are the items which need be added or changed. 
> I think this page can be broken into two segments. First can be following: 
> 
> Introduction to Random Forests
> Random Forests are an ensemble machine learning technique originally proposed 
> by Leo Breiman (UCB) which uses classification and regression trees as 
> underlying classification mechanism. Trademark to Random Forest is maintained 
> by Leo Breiman and Adele Cutler. 
> Official website for Random Forests: 
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
> Original paper published: http://oz.berkeley.edu/~breiman/randomforest2001.pdf
> 
> Second section can following: 
> 
> Classifying with random forests with Mahout
> 
> This section can be what it is right now on the website.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1516) run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.

2015-03-30 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo resolved MAHOUT-1516.

Resolution: Fixed

> run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all 
> does not exists in hdfs.
> --
>
> Key: MAHOUT-1516
> URL: https://issues.apache.org/jira/browse/MAHOUT-1516
> Project: Mahout
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.9
> Environment: hadoop2.2.0 mahout0.9 ubuntu12.04 
>Reporter: Jian Pan
>Assignee: Andrew Palumbo
>Priority: Minor
>  Labels: legacy, patch
> Fix For: 0.10.0
>
>
> + echo 'Copying 20newsgroups data to HDFS'
> Copying 20newsgroups data to HDFS
> + set +e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -rmr 
> /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> rmr: DEPRECATED: Please use 'rm -r' instead.
> 14/04/17 10:26:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rmr: `/tmp/mahout-work-jpan/20news-all': No such file or directory
> + set -e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -put 
> /tmp/mahout-work-jpan/20news-all /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> 14/04/17 10:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> put: `/tmp/mahout-work-jpan/20news-all': No such file or directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1559) Add documentation for and clean up the wikipedia classifier example

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387332#comment-14387332
 ] 

ASF GitHub Bot commented on MAHOUT-1559:


Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/90


> Add documentation for and clean up the wikipedia classifier example
> ---
>
> Key: MAHOUT-1559
> URL: https://issues.apache.org/jira/browse/MAHOUT-1559
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
>Priority: Minor
>  Labels: DSL, legacy, scala
> Fix For: 0.10.0
>
>
> Add documentation for the wikipedia classifer example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1559) Add documentation for and clean up the wikipedia classifier example

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387308#comment-14387308
 ] 

ASF GitHub Bot commented on MAHOUT-1559:


GitHub user andrewpalumbo opened a pull request:

https://github.com/apache/mahout/pull/90

MAHOUT-1559 Clean up and fix the wikipedia classification example



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewpalumbo/mahout MAHOUT-1559

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/90.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #90


commit 35ae825d6842ab05b0b5003572a2481560718b08
Author: Andrew Palumbo 
Date:   2015-03-30T19:57:17Z

Clean up and fix the wikipedia classification example




> Add documentation for and clean up the wikipedia classifier example
> ---
>
> Key: MAHOUT-1559
> URL: https://issues.apache.org/jira/browse/MAHOUT-1559
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
>Priority: Minor
>  Labels: DSL, legacy, scala
> Fix For: 0.10.0
>
>
> Add documentation for the wikipedia classifer example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Reopened] (MAHOUT-1516) run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.

2015-03-30 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo reopened MAHOUT-1516:


oops.. spoke too soon must have added the HDFS directory by hand when I was 
walking through the steps by step Naive Bayes example.  

we do need to create the base {{tmp/mahout-work-user}} directory in HDFS

> run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all 
> does not exists in hdfs.
> --
>
> Key: MAHOUT-1516
> URL: https://issues.apache.org/jira/browse/MAHOUT-1516
> Project: Mahout
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.9
> Environment: hadoop2.2.0 mahout0.9 ubuntu12.04 
>Reporter: Jian Pan
>Assignee: Andrew Palumbo
>Priority: Minor
>  Labels: legacy, patch
> Fix For: 0.10.0
>
>
> + echo 'Copying 20newsgroups data to HDFS'
> Copying 20newsgroups data to HDFS
> + set +e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -rmr 
> /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> rmr: DEPRECATED: Please use 'rm -r' instead.
> 14/04/17 10:26:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rmr: `/tmp/mahout-work-jpan/20news-all': No such file or directory
> + set -e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -put 
> /tmp/mahout-work-jpan/20news-all /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> 14/04/17 10:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> put: `/tmp/mahout-work-jpan/20news-all': No such file or directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MAHOUT-1516) run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.

2015-03-30 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387281#comment-14387281
 ] 

Andrew Palumbo edited comment on MAHOUT-1516 at 3/30/15 7:39 PM:
-

oops.. spoke too soon must have added the HDFS directory by hand when I was 
walking through the steps by step Naive Bayes example.  

we do need to create the base {{tmp/mahout-work-user}} directory in HDFS

will do this today.


was (Author: andrew_palumbo):
oops.. spoke too soon must have added the HDFS directory by hand when I was 
walking through the steps by step Naive Bayes example.  

we do need to create the base {{tmp/mahout-work-user}} directory in HDFS

> run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all 
> does not exists in hdfs.
> --
>
> Key: MAHOUT-1516
> URL: https://issues.apache.org/jira/browse/MAHOUT-1516
> Project: Mahout
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.9
> Environment: hadoop2.2.0 mahout0.9 ubuntu12.04 
>Reporter: Jian Pan
>Assignee: Andrew Palumbo
>Priority: Minor
>  Labels: legacy, patch
> Fix For: 0.10.0
>
>
> + echo 'Copying 20newsgroups data to HDFS'
> Copying 20newsgroups data to HDFS
> + set +e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -rmr 
> /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> rmr: DEPRECATED: Please use 'rm -r' instead.
> 14/04/17 10:26:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rmr: `/tmp/mahout-work-jpan/20news-all': No such file or directory
> + set -e
> + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -put 
> /tmp/mahout-work-jpan/20news-all /tmp/mahout-work-jpan/20news-all
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> 14/04/17 10:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> put: `/tmp/mahout-work-jpan/20news-all': No such file or directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

2015-03-30 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov reassigned MAHOUT-1660:


Assignee: Dmitriy Lyubimov  (was: Suneel Marthi)

> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> --
>
> Key: MAHOUT-1660
> URL: https://issues.apache.org/jira/browse/MAHOUT-1660
> Project: Mahout
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.10.1
>Reporter: Suneel Marthi
>Assignee: Dmitriy Lyubimov
>Priority: Minor
> Fix For: 0.10.0
>
>
> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop configuration from 
> Context and not ignore it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

2015-03-30 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387162#comment-14387162
 ] 

Dmitriy Lyubimov commented on MAHOUT-1660:
--

i have a fix for that. if you don't mind, i'll fix that for 0.10.1

> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> --
>
> Key: MAHOUT-1660
> URL: https://issues.apache.org/jira/browse/MAHOUT-1660
> Project: Mahout
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.10.1
>Reporter: Suneel Marthi
>Assignee: Dmitriy Lyubimov
>Priority: Minor
> Fix For: 0.10.0
>
>
> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop configuration from 
> Context and not ignore it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

2015-03-30 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov updated MAHOUT-1660:
-
Affects Version/s: (was: 0.10.0)
   0.10.1

> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> --
>
> Key: MAHOUT-1660
> URL: https://issues.apache.org/jira/browse/MAHOUT-1660
> Project: Mahout
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.10.1
>Reporter: Suneel Marthi
>Assignee: Dmitriy Lyubimov
>Priority: Minor
> Fix For: 0.10.0
>
>
> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop configuration from 
> Context and not ignore it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Mahout 0.10.0 Bug bash

2015-03-30 Thread Andrew Musselman
Monday(six days from code freeze Sunday)

Andrew Palumbo
--
M-1493: Port Naive Bayes to Spark DSL(Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Gokhan Capan
--
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Pat Ferrel
-
M-1507: Support input and output using user defined ID wherever possible
M-1589: mahout.cmd has duplicated content(Patch available)

Sebastian Schelter
--
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it(Patch available)

Shannon Quinn
---
M-1661: Remove Lanczos from the code base
M-1662: Potential Path bug in SequenceFileVaultIterator breaks
DisplaySpectralKMeans

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1602: Euclidean Distance Similarity Math
M-1650: upgrade 3rd party jars

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1586: Collections downloads must have hash signatures
M-1619: HighDFWordsPruner overwrites cache files
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Unassigned
--
M-1551: Add document to describe how to use mlp with command line(Patch
available)
M-1557: Add support for sparse training vectors in MLP(Patch available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException(Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
available)
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs


[jira] [Updated] (MAHOUT-1470) Topic dump

2015-03-30 Thread Andrew Musselman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Musselman updated MAHOUT-1470:
-
Fix Version/s: (was: 1.0)
   0.10.0

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1598) extend seq2sparse to handle multiple text blocks of same document

2015-03-30 Thread Andrew Musselman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Musselman resolved MAHOUT-1598.
--
Resolution: Fixed

Fixed by PR #34.

> extend seq2sparse to handle multiple text blocks of same document
> -
>
> Key: MAHOUT-1598
> URL: https://issues.apache.org/jira/browse/MAHOUT-1598
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.9
>Reporter: Wolfgang Buchner
>Assignee: Andrew Musselman
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Currently the seq2sparse or in particular the 
> org.apache.mahout.vectorizer.DictionaryVectorizer needs as input exactly one 
> text block per document.
> I stumbled on this because i'm having an use case where one document 
> represents a ticket which can have several text blocks in different 
> languages. 
> So my idea was that the org.apache.mahout.vectorizer.DocumentProcessor shall 
> tokenize each text block itself. So i can use language specific features in 
> our Lucene Analyzer.
> Unfortunately the current implementation doesn't support this.
> But with just minor changes this can be made possible.
> The only thing which has to be changed would be the 
> org.apache.mahout.vectorizer.term.TFPartialVectorReducer to handle all values 
> of the iterable (not just the 1st one >.<)
> An Alternative would be to change this Reducer to a Mapper, i don't get why 
> in the 1st place this is implemented as an reducer. Is there any benefit from 
> this?
> I will provide a PR via github.
> Please have a look onto this and tell me if i am assuming anything wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1477) Clean up website on Logistic Regression

2015-03-30 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo resolved MAHOUT-1477.

Resolution: Done

I've added some more references to the 20 newsgroups example and a few more 
links.  The images still could use an update.  

> Clean up website on Logistic Regression
> ---
>
> Key: MAHOUT-1477
> URL: https://issues.apache.org/jira/browse/MAHOUT-1477
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Sebastian Schelter
>Assignee: Andrew Palumbo
>  Labels: legacy
> Fix For: 0.10.0
>
>
> The website on Logistic regression needs clean up. We need to go through the 
> text, remove dead links and check whether the information is still consistent 
> with the current code. We should also link to the example created in 
> MAHOUT-1425 
> https://mahout.apache.org/users/classification/logistic-regression.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


refactor mr/hdfs

2015-03-30 Thread Pat Ferrel
Just pushed a version of the refactor that passes unit tests 
https://github.com/apache/mahout/pull/86

It doesn’t have the isDirectory fix yet so will not run on hadoop 1.2.1 but if 
anyone can test a clustered Spark or Hadoop job on it would be appreciated. 

[jira] [Commented] (MAHOUT-1598) extend seq2sparse to handle multiple text blocks of same document

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386947#comment-14386947
 ] 

ASF GitHub Bot commented on MAHOUT-1598:


Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/34


> extend seq2sparse to handle multiple text blocks of same document
> -
>
> Key: MAHOUT-1598
> URL: https://issues.apache.org/jira/browse/MAHOUT-1598
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.9
>Reporter: Wolfgang Buchner
>Assignee: Andrew Musselman
>  Labels: legacy
> Fix For: 0.10.0
>
>
> Currently the seq2sparse or in particular the 
> org.apache.mahout.vectorizer.DictionaryVectorizer needs as input exactly one 
> text block per document.
> I stumbled on this because i'm having an use case where one document 
> represents a ticket which can have several text blocks in different 
> languages. 
> So my idea was that the org.apache.mahout.vectorizer.DocumentProcessor shall 
> tokenize each text block itself. So i can use language specific features in 
> our Lucene Analyzer.
> Unfortunately the current implementation doesn't support this.
> But with just minor changes this can be made possible.
> The only thing which has to be changed would be the 
> org.apache.mahout.vectorizer.term.TFPartialVectorReducer to handle all values 
> of the iterable (not just the 1st one >.<)
> An Alternative would be to change this Reducer to a Mapper, i don't get why 
> in the 1st place this is implemented as an reducer. Is there any benefit from 
> this?
> I will provide a PR via github.
> Please have a look onto this and tell me if i am assuming anything wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1662) Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans

2015-03-30 Thread Shannon Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386917#comment-14386917
 ] 

Shannon Quinn commented on MAHOUT-1662:
---

https://github.com/apache/mahout/pull/89

> Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans
> 
>
> Key: MAHOUT-1662
> URL: https://issues.apache.org/jira/browse/MAHOUT-1662
> Project: Mahout
>  Issue Type: Bug
>  Components: Examples, mrlegacy
>Affects Versions: 0.9
>Reporter: Shannon Quinn
>Assignee: Shannon Quinn
> Fix For: 0.10.0
>
>
> Received the following error when attempting to run DisplaySpectralKMeans:
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: 
> file://tmp/calculations/diagonal/part-r-0/tmp/calculations/diagonal/part-r-0,
>  expected: file:///
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1750)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1774)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.(SequenceFileValueIterator.java:56)
>   at 
> org.apache.mahout.clustering.spectral.VectorCache.load(VectorCache.java:115)
>   at 
> org.apache.mahout.clustering.spectral.MatrixDiagonalizeJob.runJob(MatrixDiagonalizeJob.java:77)
>   at 
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:170)
>   at 
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:117)
>   at 
> org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:76)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> Tracked the origin of the bug to line 54 of SequenceFileVaultIterator. PR 
> which contains a fix is available; I would ask for independent verification 
> before merging it with master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1662) Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386916#comment-14386916
 ] 

ASF GitHub Bot commented on MAHOUT-1662:


GitHub user magsol opened a pull request:

https://github.com/apache/mahout/pull/89

MAHOUT-1662

Changed how the qualified Path is determined so as not to append itself 
repeatedly. Requires verification.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/magsol/mahout master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/89.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #89


commit 9f89e2cde040ea6f22a3039b141a391bfd435489
Author: Shannon Quinn 
Date:   2015-03-30T16:03:52Z

Changed how the qualified Path is determined so as not to append itself 
repeatedly. Requires verification.




> Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans
> 
>
> Key: MAHOUT-1662
> URL: https://issues.apache.org/jira/browse/MAHOUT-1662
> Project: Mahout
>  Issue Type: Bug
>  Components: Examples, mrlegacy
>Affects Versions: 0.9
>Reporter: Shannon Quinn
>Assignee: Shannon Quinn
> Fix For: 0.10.0
>
>
> Received the following error when attempting to run DisplaySpectralKMeans:
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: 
> file://tmp/calculations/diagonal/part-r-0/tmp/calculations/diagonal/part-r-0,
>  expected: file:///
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1750)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1774)
>   at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.(SequenceFileValueIterator.java:56)
>   at 
> org.apache.mahout.clustering.spectral.VectorCache.load(VectorCache.java:115)
>   at 
> org.apache.mahout.clustering.spectral.MatrixDiagonalizeJob.runJob(MatrixDiagonalizeJob.java:77)
>   at 
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:170)
>   at 
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:117)
>   at 
> org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:76)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> Tracked the origin of the bug to line 54 of SequenceFileVaultIterator. PR 
> which contains a fix is available; I would ask for independent verification 
> before merging it with master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1662) Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans

2015-03-30 Thread Shannon Quinn (JIRA)
Shannon Quinn created MAHOUT-1662:
-

 Summary: Potential Path bug in SequenceFileVaultIterator breaks 
DisplaySpectralKMeans
 Key: MAHOUT-1662
 URL: https://issues.apache.org/jira/browse/MAHOUT-1662
 Project: Mahout
  Issue Type: Bug
  Components: Examples, mrlegacy
Affects Versions: 0.9
Reporter: Shannon Quinn
Assignee: Shannon Quinn
 Fix For: 0.10.0


Received the following error when attempting to run DisplaySpectralKMeans:

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: 
file://tmp/calculations/diagonal/part-r-0/tmp/calculations/diagonal/part-r-0,
 expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1750)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1774)
at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.(SequenceFileValueIterator.java:56)
at 
org.apache.mahout.clustering.spectral.VectorCache.load(VectorCache.java:115)
at 
org.apache.mahout.clustering.spectral.MatrixDiagonalizeJob.runJob(MatrixDiagonalizeJob.java:77)
at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:170)
at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:117)
at 
org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Tracked the origin of the bug to line 54 of SequenceFileVaultIterator. PR which 
contains a fix is available; I would ask for independent verification before 
merging it with master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1585) Javadocs are not hosted By Mahout Quality

2015-03-30 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386912#comment-14386912
 ] 

Suneel Marthi commented on MAHOUT-1585:
---

Thanks Stevo. Please get in touch with Maxmilian Michels 
(m...@data-artisans.com) from the Flink project if you need any help.

> Javadocs are not hosted By Mahout Quality
> -
>
> Key: MAHOUT-1585
> URL: https://issues.apache.org/jira/browse/MAHOUT-1585
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Andrew Palumbo
>Assignee: Stevo Slavic
>  Labels: DSL, legacy, scala, spark
> Fix For: 0.10.0
>
>
> The links to Javadocs for Math, Integration and Examples are all redirected 
> to a password protected Build page. MR-Legacy is currently the only Javadoc 
> being published and hosted by Mahout-Quality



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1585) Javadocs are not hosted By Mahout Quality

2015-03-30 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386905#comment-14386905
 ] 

Stevo Slavic commented on MAHOUT-1585:
--

I can work on this 

> Javadocs are not hosted By Mahout Quality
> -
>
> Key: MAHOUT-1585
> URL: https://issues.apache.org/jira/browse/MAHOUT-1585
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Andrew Palumbo
>Assignee: Stevo Slavic
>  Labels: DSL, legacy, scala, spark
> Fix For: 0.10.0
>
>
> The links to Javadocs for Math, Integration and Examples are all redirected 
> to a password protected Build page. MR-Legacy is currently the only Javadoc 
> being published and hosted by Mahout-Quality



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1585) Javadocs are not hosted By Mahout Quality

2015-03-30 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic reassigned MAHOUT-1585:


Assignee: Stevo Slavic  (was: Suneel Marthi)

> Javadocs are not hosted By Mahout Quality
> -
>
> Key: MAHOUT-1585
> URL: https://issues.apache.org/jira/browse/MAHOUT-1585
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Andrew Palumbo
>Assignee: Stevo Slavic
>  Labels: DSL, legacy, scala, spark
> Fix For: 0.10.0
>
>
> The links to Javadocs for Math, Integration and Examples are all redirected 
> to a password protected Build page. MR-Legacy is currently the only Javadoc 
> being published and hosted by Mahout-Quality



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MAHOUT-1585) Javadocs are not hosted By Mahout Quality

2015-03-30 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386895#comment-14386895
 ] 

Suneel Marthi edited comment on MAHOUT-1585 at 3/30/15 3:54 PM:


I heard from the Flink Team and below's what they r doing for hosting Javadocs 
and Scaladocs. I don't have the breathing space to work on this, if someone 
would like to volunteer please feel free.

{Code}

Flink documentation is built nightly via Buildbot provided by the Apache Infra 
team. Basically, you would create a Python config file for Mahout which defines 
a set of builders. You can check out Flink's config file as an example of how 
to build and publish docs [0]. There is also a README [1]. These builders [2] 
are then loaded into the global Buildbot config via the projects.conf file [3].

The tricky part about this setup is that the Buildbot config is loaded into a 
global namespace. So you have to be careful with syntax errors and duplicate 
variable names. Actual debugging is only possible via the help of the Apache 
Infra team. Their IRC channel has proven to be very helpful [4].

{Code}

[0] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
[1] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/readme.txt
[2] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects
[3] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/projects.conf
[4] https://www.hipchat.com/g7itl962x




was (Author: smarthi):
I heard from the Flink Team and below's what they r doing for hosting Javadocs 
and Scaladocs. I don't have the breathing space to work on this, if someone 
would like to volunteer please feel free.

{Code}

Flink documentation is built nightly via Buildbot provided by the Apache Infra 
team. Basically, you would create a Python config file for Mahout which defines 
a set of builders. You can check out Flink's config file as an example of how 
to build and publish docs [0]. There is also a README [1]. These builders [2] 
are then loaded into the global Buildbot config via the projects.conf file [3].

The tricky part about this setup is that the Buildbot config is loaded into a 
global namespace. So you have to be careful with syntax errors and duplicate 
variable names. Actual debugging is only possible via the help of the Apache 
Infra team. Their IRC channel has proven to be very helpful [4].

[0] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
[1] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/readme.txt
[2] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects
[3] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/projects.conf
[4] https://www.hipchat.com/g7itl962x

{Code}

> Javadocs are not hosted By Mahout Quality
> -
>
> Key: MAHOUT-1585
> URL: https://issues.apache.org/jira/browse/MAHOUT-1585
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>  Labels: DSL, legacy, scala, spark
> Fix For: 0.10.0
>
>
> The links to Javadocs for Math, Integration and Examples are all redirected 
> to a password protected Build page. MR-Legacy is currently the only Javadoc 
> being published and hosted by Mahout-Quality



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1585) Javadocs are not hosted By Mahout Quality

2015-03-30 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386895#comment-14386895
 ] 

Suneel Marthi commented on MAHOUT-1585:
---

I heard from the Flink Team and below's what they r doing for hosting Javadocs 
and Scaladocs. I don't have the breathing space to work on this, if someone 
would like to volunteer please feel free.

{Code}

Flink documentation is built nightly via Buildbot provided by the Apache Infra 
team. Basically, you would create a Python config file for Mahout which defines 
a set of builders. You can check out Flink's config file as an example of how 
to build and publish docs [0]. There is also a README [1]. These builders [2] 
are then loaded into the global Buildbot config via the projects.conf file [3].

The tricky part about this setup is that the Buildbot config is loaded into a 
global namespace. So you have to be careful with syntax errors and duplicate 
variable names. Actual debugging is only possible via the help of the Apache 
Infra team. Their IRC channel has proven to be very helpful [4].

[0] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
[1] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/readme.txt
[2] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects
[3] 
https://svn.us.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/projects.conf
[4] https://www.hipchat.com/g7itl962x

{Code}

> Javadocs are not hosted By Mahout Quality
> -
>
> Key: MAHOUT-1585
> URL: https://issues.apache.org/jira/browse/MAHOUT-1585
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>  Labels: DSL, legacy, scala, spark
> Fix For: 0.10.0
>
>
> The links to Javadocs for Math, Integration and Examples are all redirected 
> to a password protected Build page. MR-Legacy is currently the only Javadoc 
> being published and hosted by Mahout-Quality



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386872#comment-14386872
 ] 

Pat Ferrel commented on MAHOUT-1655:


So here is guava in the MAHOUT-1655 branch

mahout (parent pom): guava 14 (explicit version)
integration: guava 14 (inherited version)
mahout-hdfs: guava 14 (inherited version)
mahout-mr: guava 11 (explicit version)
math: guava 14 (inherited version)
mahout-spark: guava 14 (explicit version)
spark-shell: guava 14 (explicit version) < suspect this is not needed?

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386842#comment-14386842
 ] 

Stevo Slavic commented on MAHOUT-1655:
--

What I described yesterday, 14.0.1 default, 11.0.2 overriding for mr was not 
something I've done, but assumed Pat you will do in your pull request/branch.
It's opposite state in master branch at the moment.

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386844#comment-14386844
 ] 

Pat Ferrel commented on MAHOUT-1655:


In the Mahout script, I assume that mahout-mr is not needed in H2O or Spark 
drivers and shell.

Also removing mahout-mr from H2O pom, not it only depends on mahout-hdfs.

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386814#comment-14386814
 ] 

Pat Ferrel commented on MAHOUT-1655:


In integration

Seems to only need Iterators.skip changed to Iterators.advance

Iterables.skip is a class method so leaving it alone.

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386795#comment-14386795
 ] 

Pat Ferrel commented on MAHOUT-1655:


Not questioning the api, you guys obviously know guava apis. 

I'm trying to understand the reasoning for 11 vs 14. Seems odd to only use 11 
in mahout-mr but 14 in integration, which is what you are calling for.

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386783#comment-14386783
 ] 

Pat Ferrel commented on MAHOUT-1655:


The way the poms are setup now makes Guava 14.0.1 the default since it's in the 
parent pom. It is overridden in mahout mr for 11.0.2

Spark forces 14.0.1 in its pom.

Is this correct? It means anything other than mahout-mr will use 14.0.1.

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Suneel Marthi
Iterators.advance() in place since Guava 13.0
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterators.html#advance%28java.util.Iterator,%20int%29



On Mon, Mar 30, 2015 at 10:37 AM, Pat Ferrel (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386770#comment-14386770
> ]
>
> Pat Ferrel commented on MAHOUT-1655:
> 
>
> Suneel says: "Replace Iterators.skip() to Iterators.advance() to get past
> that error."
>
> I assume this is a guava 14.0.1 API since we are using that as the deafult
> non-mr modules.
>
> > Refactor module dependencies
> > 
> >
> > Key: MAHOUT-1655
> > URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> > Project: Mahout
> >  Issue Type: Improvement
> >  Components: mrlegacy
> >Affects Versions: 0.9
> >Reporter: Pat Ferrel
> >Assignee: Andrew Musselman
> >Priority: Critical
> > Fix For: 0.10.0
> >
> >
> > Make a new module, call it mahout-hadoop. Move anything there that is
> currently in mrlegacy but used in math-scala or spark. Remove dependencies
> on mrlegacy altogether if possible by using other core classes.
> > The goal is to have math-scala and spark module depend on math, and a
> small module called mahout-hadoop (much smaller than mrlegacy).
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386780#comment-14386780
 ] 

Stevo Slavic commented on MAHOUT-1655:
--

Yes, guava guys renamed skip to advance in 13.0 (see 
[here|https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/Iterators.java?name=v14.0.1#971])

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-30 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386770#comment-14386770
 ] 

Pat Ferrel commented on MAHOUT-1655:


Suneel says: "Replace Iterators.skip() to Iterators.advance() to get past that 
error."

I assume this is a guava 14.0.1 API since we are using that as the deafult 
non-mr modules.

> Refactor module dependencies
> 
>
> Key: MAHOUT-1655
> URL: https://issues.apache.org/jira/browse/MAHOUT-1655
> Project: Mahout
>  Issue Type: Improvement
>  Components: mrlegacy
>Affects Versions: 0.9
>Reporter: Pat Ferrel
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1659) Remove deprecated Lanczos solver from spectral clustering in mr-legacy

2015-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386293#comment-14386293
 ] 

Hudson commented on MAHOUT-1659:


SUCCESS: Integrated in Mahout-Quality #3027 (See 
[https://builds.apache.org/job/Mahout-Quality/3027/])
MAHOUT-1659: Remove deprecated Lanczos solver from spectral clustering in 
mr-legacy, this closes #88 (suneel.marthi: rev 
4b1c133325da0119e693b69811a54a16cd77aa55)
* CHANGELOG
* 
examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java
* 
examples/src/main/java/org/apache/mahout/clustering/display/DisplaySpectralKMeans.java
* 
mrlegacy/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java


> Remove deprecated Lanczos solver from spectral clustering in mr-legacy
> --
>
> Key: MAHOUT-1659
> URL: https://issues.apache.org/jira/browse/MAHOUT-1659
> Project: Mahout
>  Issue Type: Task
>  Components: Clustering, mrlegacy
>Affects Versions: 0.9
>Reporter: Shannon Quinn
>Assignee: Shannon Quinn
>Priority: Minor
> Fix For: 0.10.0
>
>
> Spectral clustering still has the option of using either SSVD or the Lanczos 
> solver for dimensionality reduction. Remove the latter entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)