from:"Gokhan Capan"


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016372#comment-14016372
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Seems like the dependencies are correctly set. Are you certain that the cluster 
you're running mahout against is an hadoop-2 and M/R-2 cluster?

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 
 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1565) add MR2 options to MAHOUT_OPTS in bin/mahout


[ 
https://issues.apache.org/jira/browse/MAHOUT-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016378#comment-14016378
 ] 

Gokhan Capan commented on MAHOUT-1565:
--

We agree, conceptually, but this needs some further testing.

 add MR2 options to MAHOUT_OPTS in bin/mahout
 

 Key: MAHOUT-1565
 URL: https://issues.apache.org/jira/browse/MAHOUT-1565
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 1.0, 0.9
Reporter: Nishkam Ravi
 Fix For: 1.0

 Attachments: MAHOUT-1565.patch


 MR2 options are missing in MAHOUT_OPTS in bin/mahout and bin/mahout.cmd. Add 
 those options.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016565#comment-14016565
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Brian, 
This was actually well-tested. But I'm gonna build and test it again, probably 
tomorrow. 
By the way can you run a 
{{$ find . -name hadoop*.jar}}

after building mahout, in the mahout root director.
Best

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 
 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations


[ 
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016998#comment-14016998
 ] 

Gokhan Capan commented on MAHOUT-1529:
--

Alright, I'm sold.

 Finalize abstraction of distributed logical plans from backend operations
 -

 Key: MAHOUT-1529
 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
 Project: Mahout
  Issue Type: Improvement
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
 Fix For: 1.0


 We have a few situations when algorithm-facing API has Spark dependencies 
 creeping in. 
 In particular, we know of the following cases:
 -(1) checkpoint() accepts Spark constant StorageLevel directly;-
 -(2) certain things in CheckpointedDRM;-
 -(3) drmParallelize etc. routines in the drm and sparkbindings package.-
 -(5) drmBroadcast returns a Spark-specific Broadcast object-
 (6) Stratosphere/Flink conceptual api changes.
 *Current tracker:* PR #1 https://github.com/apache/mahout/pull/1 - closed, 
 need new PR for remaining things once ready.
 *Pull requests are welcome*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

2014-06-01 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014985#comment-14014985
 ] 

Gokhan Capan commented on MAHOUT-1529:
--

[~dlyubimov], I imagine in the near future we will want to add a matrix 
implementation with fast row and column access for in-memory algorithms such as 
neighborhood based recommendation. This could be a new persistent storage 
engineered for locality preservation of kNN, the new Solr backend potentially 
cast to a Matrix, or something else. 

Anyway, my point is that we could want to add different types of distributed 
matrices with engine (or data structure) specific strengths in the future. I 
suggest turning each bahavior (such as Caching) into an additional trait, which 
the distributed execution engine (or data structure) author can mixin to her 
concrete implementation (For example Spark's matrix is one with Caching and 
Broadcasting). It might even help with easier logical planning (if it supports 
caching cache it, if partitioned in the same way do this else do this, if one 
matrix is small broadcast it etc.). 

So I suggest a  a base Matrix trait with nrows and ncols methods (as it 
currently is), a BatchExecution trait with methods for partitioning and 
execution in parallel behavior, a Caching trait with methods for 
caching/uncaching behavior, in the future a RandomAccess trait with methods for 
accessing rows and columns (and possibly cells) functionality. 

Then a concrete DRM (like) would be a Matrix with BatchOps and possibly 
CacheOps, a concrete RandomAccessMatrix would be a Matrix with RandomAccessOps, 
and so on. What do you think and if you and others are positive, how do you 
think that should be handled?

 Finalize abstraction of distributed logical plans from backend operations
 -

 Key: MAHOUT-1529
 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
 Project: Mahout
  Issue Type: Improvement
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
 Fix For: 1.0


 We have a few situations when algorithm-facing API has Spark dependencies 
 creeping in. 
 In particular, we know of the following cases:
 -(1) checkpoint() accepts Spark constant StorageLevel directly;-
 -(2) certain things in CheckpointedDRM;-
 -(3) drmParallelize etc. routines in the drm and sparkbindings package.-
 -(5) drmBroadcast returns a Spark-specific Broadcast object-
 (6) Stratosphere/Flink conceptual api changes.
 *Current tracker:* PR #1 https://github.com/apache/mahout/pull/1 - closed, 
 need new PR for remaining things once ready.
 *Pull requests are welcome*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

2014-06-01 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014985#comment-14014985
 ] 

Gokhan Capan edited comment on MAHOUT-1529 at 6/1/14 2:55 PM:
--

[~dlyubimov], I imagine in the near future we will want to add a matrix 
implementation with fast row and column access for in-memory algorithms such as 
neighborhood based recommendation. This could be a new persistent storage 
engineered for locality preservation of kNN, the new Solr backend potentially 
cast to a Matrix, or something else. 

Anyway, my point is that we could want to add different types of distributed 
matrices with engine (or data structure) specific strengths in the future. I 
suggest turning each bahavior (such as Caching) into an additional trait, which 
the distributed execution engine (or data structure) author can mixin to her 
concrete implementation (For example Spark's matrix is one with Caching and 
Broadcasting). It might even help with easier logical planning (if it supports 
caching cache it, if partitioned in the same way do this else do this, if one 
matrix is small broadcast it etc.). 

So I suggest a  a base Matrix trait with nrows and ncols methods (as it 
currently is), a BatchExecution trait with methods for partitioning and 
execution in parallel behavior, a Caching trait with methods for 
caching/uncaching behavior, in the future a RandomAccess trait with methods for 
accessing rows and columns (and possibly cells) functionality. 

Then a concrete DRM (like) would be a Matrix with BatchExecution and possibly 
Caching, a concrete RandomAccessMatrix would be a Matrix with RandomAccess, and 
so on. What do you think and if you and others are positive, how do you think 
that should be handled?


was (Author: gokhancapan):
[~dlyubimov], I imagine in the near future we will want to add a matrix 
implementation with fast row and column access for in-memory algorithms such as 
neighborhood based recommendation. This could be a new persistent storage 
engineered for locality preservation of kNN, the new Solr backend potentially 
cast to a Matrix, or something else. 

Anyway, my point is that we could want to add different types of distributed 
matrices with engine (or data structure) specific strengths in the future. I 
suggest turning each bahavior (such as Caching) into an additional trait, which 
the distributed execution engine (or data structure) author can mixin to her 
concrete implementation (For example Spark's matrix is one with Caching and 
Broadcasting). It might even help with easier logical planning (if it supports 
caching cache it, if partitioned in the same way do this else do this, if one 
matrix is small broadcast it etc.). 

So I suggest a  a base Matrix trait with nrows and ncols methods (as it 
currently is), a BatchExecution trait with methods for partitioning and 
execution in parallel behavior, a Caching trait with methods for 
caching/uncaching behavior, in the future a RandomAccess trait with methods for 
accessing rows and columns (and possibly cells) functionality. 

Then a concrete DRM (like) would be a Matrix with BatchOps and possibly 
CacheOps, a concrete RandomAccessMatrix would be a Matrix with RandomAccessOps, 
and so on. What do you think and if you and others are positive, how do you 
think that should be handled?

 Finalize abstraction of distributed logical plans from backend operations
 -

 Key: MAHOUT-1529
 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
 Project: Mahout
  Issue Type: Improvement
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
 Fix For: 1.0


 We have a few situations when algorithm-facing API has Spark dependencies 
 creeping in. 
 In particular, we know of the following cases:
 -(1) checkpoint() accepts Spark constant StorageLevel directly;-
 -(2) certain things in CheckpointedDRM;-
 -(3) drmParallelize etc. routines in the drm and sparkbindings package.-
 -(5) drmBroadcast returns a Spark-specific Broadcast object-
 (6) Stratosphere/Flink conceptual api changes.
 *Current tracker:* PR #1 https://github.com/apache/mahout/pull/1 - closed, 
 need new PR for remaining things once ready.
 *Pull requests are welcome*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

2014-06-01 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014985#comment-14014985
 ] 

Gokhan Capan edited comment on MAHOUT-1529 at 6/1/14 3:03 PM:
--

[~dlyubimov], I imagine in the near future we will want to add a matrix 
implementation with fast row and column access for memory-based algorithms such 
as neighborhood based recommendation. This could be a new persistent storage 
engineered for locality preservation of kNN, the new Solr backend potentially 
cast to a Matrix, or something else. 

Anyway, my point is that we could want to add different types of distributed 
matrices with engine (or data structure) specific strengths in the future. I 
suggest turning each bahavior (such as Caching) into an additional trait, which 
the distributed execution engine (or data structure) author can mixin to her 
concrete implementation (For example Spark's matrix is one with Caching and 
Broadcasting). It might even help with easier logical planning (if it supports 
caching cache it, if partitioned in the same way do this else do this, if one 
matrix is small broadcast it etc.). 

So I suggest a  a base Matrix trait with nrows and ncols methods (as it 
currently is), a BatchExecution trait with methods for partitioning and 
execution in parallel behavior, a Caching trait with methods for 
caching/uncaching behavior, in the future a RandomAccess trait with methods for 
accessing rows and columns (and possibly cells) functionality. 

Then a concrete DRM (like) would be a Matrix with BatchExecution and possibly 
Caching, a concrete RandomAccessMatrix would be a Matrix with RandomAccess, and 
so on. What do you think and if you and others are positive, how do you think 
that should be handled?


was (Author: gokhancapan):
[~dlyubimov], I imagine in the near future we will want to add a matrix 
implementation with fast row and column access for in-memory algorithms such as 
neighborhood based recommendation. This could be a new persistent storage 
engineered for locality preservation of kNN, the new Solr backend potentially 
cast to a Matrix, or something else. 

Anyway, my point is that we could want to add different types of distributed 
matrices with engine (or data structure) specific strengths in the future. I 
suggest turning each bahavior (such as Caching) into an additional trait, which 
the distributed execution engine (or data structure) author can mixin to her 
concrete implementation (For example Spark's matrix is one with Caching and 
Broadcasting). It might even help with easier logical planning (if it supports 
caching cache it, if partitioned in the same way do this else do this, if one 
matrix is small broadcast it etc.). 

So I suggest a  a base Matrix trait with nrows and ncols methods (as it 
currently is), a BatchExecution trait with methods for partitioning and 
execution in parallel behavior, a Caching trait with methods for 
caching/uncaching behavior, in the future a RandomAccess trait with methods for 
accessing rows and columns (and possibly cells) functionality. 

Then a concrete DRM (like) would be a Matrix with BatchExecution and possibly 
Caching, a concrete RandomAccessMatrix would be a Matrix with RandomAccess, and 
so on. What do you think and if you and others are positive, how do you think 
that should be handled?

 Finalize abstraction of distributed logical plans from backend operations
 -

 Key: MAHOUT-1529
 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
 Project: Mahout
  Issue Type: Improvement
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
 Fix For: 1.0


 We have a few situations when algorithm-facing API has Spark dependencies 
 creeping in. 
 In particular, we know of the following cases:
 -(1) checkpoint() accepts Spark constant StorageLevel directly;-
 -(2) certain things in CheckpointedDRM;-
 -(3) drmParallelize etc. routines in the drm and sparkbindings package.-
 -(5) drmBroadcast returns a Spark-specific Broadcast object-
 (6) Stratosphere/Flink conceptual api changes.
 *Current tracker:* PR #1 https://github.com/apache/mahout/pull/1 - closed, 
 need new PR for remaining things once ready.
 *Pull requests are welcome*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1565) add MR2 options to MAHOUT_OPTS in bin/mahout

2014-05-29 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012126#comment-14012126
 ] 

Gokhan Capan commented on MAHOUT-1565:
--

I think there is no point of configuring output compression, number of 
reducers, etc. for Mahout.

 add MR2 options to MAHOUT_OPTS in bin/mahout
 

 Key: MAHOUT-1565
 URL: https://issues.apache.org/jira/browse/MAHOUT-1565
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 1.0, 0.9
Reporter: Nishkam Ravi
 Attachments: MAHOUT-1565.patch


 MR2 options are missing in MAHOUT_OPTS in bin/mahout and bin/mahout.cmd. Add 
 those options.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1565) add MR2 options to MAHOUT_OPTS in bin/mahout

2014-05-29 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012140#comment-14012140
 ] 

Gokhan Capan commented on MAHOUT-1565:
--

Sorry, now I can read the patch properly. The MR1 versions of those 
configurations are already set in bin/mahout, and you're suggesting to add MR2 
versions of them, too, right?

I am personally not a fan of setting such configurations in Mahout, and I would 
remove them as well.

 add MR2 options to MAHOUT_OPTS in bin/mahout
 

 Key: MAHOUT-1565
 URL: https://issues.apache.org/jira/browse/MAHOUT-1565
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 1.0, 0.9
Reporter: Nishkam Ravi
 Attachments: MAHOUT-1565.patch


 MR2 options are missing in MAHOUT_OPTS in bin/mahout and bin/mahout.cmd. Add 
 those options.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Hadoop 2 support in a real release?

2014-05-23 Thread Gokhan Capan

My vote would be releasing mahout with hadoop1 and hadoop2 classifiers

Gokhan


On Fri, May 23, 2014 at 4:43 PM, Sebastian Schelter ssc.o...@googlemail.com
 wrote:

 Big +1
 Am 23.05.2014 15:33 schrieb Ted Dunning ted.dunn...@gmail.com:

  What do folks think about spinning out a new version of 0.9 that only
  changes which version of Hadoop the build uses?
 
  There have been quite a few questions lately on this topic.
 
  My suggestion would be that we use minor version numbering to maintain
 this
  and the normal 0.9 release simultaneously if we decide to do a bug fix
  release.
 
  Any thoughts?

[jira] [Assigned] (MAHOUT-1534) Add documentation for using Mahout with Hadoop2 to the website


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokhan Capan reassigned MAHOUT-1534:


Assignee: Gokhan Capan

 Add documentation for using Mahout with Hadoop2 to the website
 --

 Key: MAHOUT-1534
 URL: https://issues.apache.org/jira/browse/MAHOUT-1534
 Project: Mahout
  Issue Type: Task
  Components: Documentation
Reporter: Sebastian Schelter
Assignee: Gokhan Capan
 Fix For: 1.0


 MAHOUT-1329 describes how to build the current trunk for usage with Hadoop 2. 
 We should have a page on the website describing this for our users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1534) Add documentation for using Mahout with Hadoop2 to the website


[ 
https://issues.apache.org/jira/browse/MAHOUT-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005663#comment-14005663
 ] 

Gokhan Capan commented on MAHOUT-1534:
--

We might want to add the link to the Mahout News, but let's wait and see if the 
users could locate the page.

 Add documentation for using Mahout with Hadoop2 to the website
 --

 Key: MAHOUT-1534
 URL: https://issues.apache.org/jira/browse/MAHOUT-1534
 Project: Mahout
  Issue Type: Task
  Components: Documentation
Reporter: Sebastian Schelter
Assignee: Gokhan Capan
 Fix For: 1.0


 MAHOUT-1329 describes how to build the current trunk for usage with Hadoop 2. 
 We should have a page on the website describing this for our users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAHOUT-1534) Add documentation for using Mahout with Hadoop2 to the website


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokhan Capan resolved MAHOUT-1534.
--

Resolution: Fixed

The instructions are now available on the BuildingMahout page: 
http://mahout.apache.org/developers/buildingmahout.html

 Add documentation for using Mahout with Hadoop2 to the website
 --

 Key: MAHOUT-1534
 URL: https://issues.apache.org/jira/browse/MAHOUT-1534
 Project: Mahout
  Issue Type: Task
  Components: Documentation
Reporter: Sebastian Schelter
Assignee: Gokhan Capan
 Fix For: 1.0


 MAHOUT-1329 describes how to build the current trunk for usage with Hadoop 2. 
 We should have a page on the website describing this for our users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005719#comment-14005719
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Please check http://mahout.apache.org/developers/buildingmahout.html for 
instructions to build mahout against to hadoop-2

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 
 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Git Migration

2014-05-22 Thread Gokhan Capan

Works for me as well

Gokhan


On Thu, May 22, 2014 at 9:23 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

 Thanks; I just pushed successfully.


 On Thu, May 22, 2014 at 10:55 AM, Dmitriy Lyubimov dlie...@gmail.com
 wrote:

  did you read Jake's email earlier at dev/infra discussion? he describes
 and
  makes references here.
 
  It is two-fold: first  we can push whatever commits to master of
  https://git-wip-us.apache.org/repos/asf?p=mahout.git
 
  However the other side of the coin is that significant commits should go
  thru pull requests directly to (if i understand it correctly)
 apache/mahout
  mirror on github. Such pull requests are managed thru commits to git-wp
 as
  well by specific messages (again, see references in Jake's email). My
  understanding is that github integration features are not yet enabled,
 only
  commits to master of git-wp-us.a.o are at this point.
 
  At this point I simply would like everyone to verify they can push
 commits
  to master branch of git-wp-us.a.o per instructions in INFRA- and
 report
  back there (I can push).
 
  I guess someone (perhaps me) will have to write the manual for working
 with
  github pull requests (mainly, merging them to git-wp-us.o.a and closing
  them).
 
 
  On Thu, May 22, 2014 at 10:47 AM, Andrew Musselman 
  andrew.mussel...@gmail.com wrote:
 
   What's the workflow to commit a change?  I'm totally in the dark about
   that.
  
  
   On Thu, May 22, 2014 at 10:14 AM, Dmitriy Lyubimov dlie...@gmail.com
   wrote:
  
Hi,
   
(1) git migration of the project is now complete. Any volunteers to
   verify
per INFRA-? If you do, please report back to the issue.
   
(2) Anybody knows what to do with jenkins now? i still don't have
  proper
privileges on it. thanks.
   
   
   
[1] https://issues.apache.org/jira/browse/INFRA-

Re: consensus statement?

2014-05-21 Thread Gokhan Capan

I want to express my opinions for the vision, too. I tried to capture those
words from various discussions in the dev-list, and hope that most, of them
support the common sense of excitement the new Mahout arouses

To me, the fundamental benefit of the shift that Mahout is undergoing is a
better separation of the distributed execution engine, distributed data
structures, matrix computations, and algorithms layers, which will allow
the users/devs of Mahout with different roles focus on the relevant parts
of the framework:

   1. A machine learning scientist, independent from the underlying
   distributed execution engine, can utilize the matrix language and the
   decompositions to implement new algorithms (which implies that the current
   distributed mahout algorithms are to be rewritten in the matrix language)
   2. A math-scala module contributor, for the benefit of higher level
   algorithms, can add new, or improve existing functions (the set of
   decompositions is an example) with optimization plans (such as if two
   matrices are partitioned in the same way, ...), where the concrete
   implementations of those optimizations are delegated to the distributed
   execution engine layer
   3. A distributed execution engine author can add machine learning
   capabilities to her platform with i)concrete Matrix and Matrix I/O
   implementation  ii)partitioning, checkpointing, broadcasting behaviors,
   iii)BLAS
   4. A Mahout user with access to a cluster operated by a
   Mahout-supporting distributed execution engine can run machine learning
   algorithms implemented on top of the matrix language

Best

Gokhan


On Tue, May 20, 2014 at 8:30 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:

 inline


 On Tue, May 20, 2014 at 12:42 AM, Sebastian Schelter s...@apache.org
 wrote:

 
 
  Let's take the next from our homepage as starting point. What should we
  add/remove/modify?
 
  
  
  The Mahout community decided to move its codebase onto modern data
  processing systems that offer a richer programming model and more
 efficient
  execution than Hadoop MapReduce. Mahout will therefore reject new
 MapReduce
  algorithm implementations from now on. We will however keep our widely
 used
  MapReduce algorithms in the codebase and maintain them.
 
  We are building our future implementations on top of a

 Scala

  DSL for linear algebraic operations which has been developed over the
 last
  months. Programs written in this DSL are automatically optimized and
  executed in parallel for Apache Spark.

 More platforms to be added in the future.

 
  Furthermore, there is an experimental contribution undergoing which aims
  to integrate the h20 platform into Mahout.

[jira] [Commented] (MAHOUT-1534) Add documentation for using Mahout with Hadoop2 to the website

2014-05-21 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004662#comment-14004662
 ] 

Gokhan Capan commented on MAHOUT-1534:
--

[~ssc] I added the directions to the BuildingMahout page. If you're happy with 
the staged, I'll Publish Site

 Add documentation for using Mahout with Hadoop2 to the website
 --

 Key: MAHOUT-1534
 URL: https://issues.apache.org/jira/browse/MAHOUT-1534
 Project: Mahout
  Issue Type: Task
  Components: Documentation
Reporter: Sebastian Schelter
 Fix For: 1.0


 MAHOUT-1329 describes how to build the current trunk for usage with Hadoop 2. 
 We should have a page on the website describing this for our users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: VOTE: moving commits to git-wp.o.a github PR features.

2014-05-17 Thread Gokhan Capan

+1

Sent from my iPhone

 On May 16, 2014, at 21:38, Dmitriy Lyubimov dlie...@gmail.com wrote:

 Hi,

 I would like to initiate a procedural vote moving to git as our primary
 commit system, and using github PRs as described in Jake Farrel's email to
 @dev [1]

 [1]
 https://blogs.apache.org/infra/entry/improved_integration_between_apache_and

 If voting succeeds, i will file a ticket with infra to commence necessary
 changes and to move our project to git-wp as primary source for commits as
 well as add github integration features [1]. (I assume pure git commits
 will be required after that's done, with no svn commits allowed).

 The motivation is to engage GIT and github PR features as described, and
 avoid git mirror history messes like we've seen associated with authors.txt
 file fluctations.

 PMC and committers have binding votes, so please vote. Lazy consensus with
 minimum 3 +1 votes. Vote will conclude in 96 hours to allow some extra time
 for weekend (i.e. Tuesday afternoon PST) .

 here is my +1

 -d

[jira] [Commented] (MAHOUT-1550) Naive Bayes training fails with Hadoop 2

2014-05-15 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996351#comment-13996351
 ] 

Gokhan Capan commented on MAHOUT-1550:
--

Paul,

Did you try build mahout using hadoop 2 profile first? The way to do it is:
mvn clean package -DskipTests=true -Dhadoop2.version=YOUR_HADOOP_VERSION

Let us know if this fails

 Naive Bayes training fails with Hadoop 2
 

 Key: MAHOUT-1550
 URL: https://issues.apache.org/jira/browse/MAHOUT-1550
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Ubuntu - Mahout 1.0-SNAPSHOT - Hadoop 2
Reporter: Paul Marret
Priority: Minor
  Labels: bayesian, training
 Attachments: mahout-snapshot.patch, stacktrace.txt

   Original Estimate: 0h
  Remaining Estimate: 0h

 When using the trainnb option of the program, we get the following error:
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at 
 org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
 at 
 org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
 at 
 org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:100)
 [...]
 It is possible to correct this by modifying the file 
 mrlegacy/src/main/java/org/apache/mahout/common/HadoopUtil.java and 
 converting the instance job (line 174) to a Job object (it is a JobContext in 
 the current version).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MAHOUT-1550) Naive Bayes training fails with Hadoop 2

2014-05-13 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996351#comment-13996351
 ] 

Gokhan Capan edited comment on MAHOUT-1550 at 5/13/14 1:10 PM:
---

Paul,

Did you try building mahout using hadoop 2 profile first? The way to do it is:
mvn clean package -DskipTests=true -Dhadoop2.version=YOUR_HADOOP_VERSION

Let us know if this fails


was (Author: gokhancapan):
Paul,

Did you try build mahout using hadoop 2 profile first? The way to do it is:
mvn clean package -DskipTests=true -Dhadoop2.version=YOUR_HADOOP_VERSION

Let us know if this fails

 Naive Bayes training fails with Hadoop 2
 

 Key: MAHOUT-1550
 URL: https://issues.apache.org/jira/browse/MAHOUT-1550
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Ubuntu - Mahout 1.0-SNAPSHOT - Hadoop 2
Reporter: Paul Marret
Priority: Minor
  Labels: bayesian, training
 Attachments: mahout-snapshot.patch, stacktrace.txt

   Original Estimate: 0h
  Remaining Estimate: 0h

 When using the trainnb option of the program, we get the following error:
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at 
 org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
 at 
 org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
 at 
 org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:100)
 [...]
 It is possible to correct this by modifying the file 
 mrlegacy/src/main/java/org/apache/mahout/common/HadoopUtil.java and 
 converting the instance job (line 174) to a Job object (it is a JobContext in 
 the current version).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2014-04-14 Thread Gokhan Capan (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968148#comment-13968148
]

Gokhan Capan commented on MAHOUT-1178:
--

Well I can add this, but considering the current status of the project, I think
this is no longer in people's interest.
What do you say [~ssc], should we 'won't fix' it or commit?

GSOC 2013: Improve Lucene support in Mahout
---

Key: MAHOUT-1178
URL: https://issues.apache.org/jira/browse/MAHOUT-1178
Project: Mahout
Issue Type: New Feature
Reporter: Dan Filimon
Assignee: Gokhan Capan
Labels: gsoc2013, mentor
Fix For: 1.0

Attachments: MAHOUT-1178-TEST.patch, MAHOUT-1178.patch

[via Ted Dunning]
It should be possible to view a Lucene index as a matrix. This would
require that we standardize on a way to convert documents to rows. There
are many choices, the discussion of which should be deferred to the actual
work on the project, but there are a few obvious constraints:
a) it should be possible to get the same result as dumping the term vectors
for each document each to a line and converting that result using standard
Mahout methods.
b) numeric fields ought to work somehow.
c) if there are multiple text fields that ought to work sensibly as well.
Two options include dumping multiple matrices or to convert the fields
into a single row of a single matrix.
d) it should be possible to refer back from a row of the matrix to find the
correct document. THis might be because we remember the Lucene doc number
or because a field is named as holding a unique id.
e) named vectors and matrices should be used if plausible.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2014-04-14 Thread Gokhan Capan (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968221#comment-13968221
]

Gokhan Capan commented on MAHOUT-1178:
--

I personally like the idea of integrating additional storage layers as matrix
inputs, but not like the implementation I did here.
After agreeing on the new algorithm layers, we can later move to the the
additional input formats.

So my vote also is for Won't Fix

GSOC 2013: Improve Lucene support in Mahout
---

Key: MAHOUT-1178
URL: https://issues.apache.org/jira/browse/MAHOUT-1178
Project: Mahout
Issue Type: New Feature
Reporter: Dan Filimon
Assignee: Gokhan Capan
Labels: gsoc2013, mentor
Fix For: 1.0

Attachments: MAHOUT-1178-TEST.patch, MAHOUT-1178.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2014-04-14 Thread Gokhan Capan (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968254#comment-13968254
]

Gokhan Capan commented on MAHOUT-1178:
--

The thing is it just 'loads' a Lucene index in memory as a matrix. You
construct a matrix with the lucene index directory location and that's it. So
it is not a fix for incremental document management issue.

The alternative approach is querying the index when a row/column vector, or
cell is required. I, however, am not sure if the SolrMatrix thing is fast
enough for that.

I haven't been available lately, and now I'm reading through the changes in and
proposals for Mahout's future, and trying to set up my perspective for Mahout2.
We probably can come up with a better way of document storage (still
Lucene/Solr based). Let me leave this as is now, and then we can discuss the
input formats further.

Is that OK for you?

GSOC 2013: Improve Lucene support in Mahout
---

Key: MAHOUT-1178
URL: https://issues.apache.org/jira/browse/MAHOUT-1178
Project: Mahout
Issue Type: New Feature
Reporter: Dan Filimon
Assignee: Gokhan Capan
Labels: gsoc2013, mentor
Fix For: 1.0

Attachments: MAHOUT-1178-TEST.patch, MAHOUT-1178.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2014-03-03 Thread Gokhan Capan (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918159#comment-13918159
]

Gokhan Capan commented on MAHOUT-1178:
--

Let me get the pieces together and submit a patch in a few days.

GSOC 2013: Improve Lucene support in Mahout
---

Key: MAHOUT-1178
URL: https://issues.apache.org/jira/browse/MAHOUT-1178
Project: Mahout
Issue Type: New Feature
Reporter: Dan Filimon
Assignee: Gokhan Capan
Labels: gsoc2013, mentor
Fix For: 1.0

Attachments: MAHOUT-1178-TEST.patch, MAHOUT-1178.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914494#comment-13914494
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Sure I can.

Although my vote would be passing the version, considering different 
distributions out there, people may want to build mahout against whatever 
hadoop2 distro they use (I am not very sure about my own argument actually, It 
would be great to hear a counter-argument)

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 
 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-25 Thread Gokhan Capan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokhan Capan updated MAHOUT-1329:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-25 Thread Gokhan Capan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911436#comment-13911436
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

I committed this to trunk

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907480#comment-13907480
 ] 

Gokhan Capan edited comment on MAHOUT-1329 at 2/21/14 9:52 AM:
---

Sergey, I modified your patch and produced a new version. Looking into the 
dependency tree, it seems it builds against the correct hadoop version.

(This may seem irrelevant when looking at the patch, but I had to set argLine 
to -Xmx1024m in order not the unit tests to fail because of an OOM)

for hadoop version 1.2.1: mvn clean package
for hadoop version 2.2.0: mvn clean package -Dhadoop2.version=2.2.0

I unit tested this for both versions and saw the tests passed, but I don't have 
access to a hadoop test environment currently, so could you guys test if this 
actually work (I'll do it tomorrow anyway)? 

Then we can commit it.


was (Author: gokhancapan):
Sergey, I modified your patch and produced a new version. Looking into the 
dependency tree, it seems it builds against the correct hadoop version.

(This may seem irrelevant when looking at the patch, but I had to set argLine 
to -Xmx1024m in order not the unit tests to fail because of an OOM)

for hadoop version 1.2.1: mvn clean package
for hadoop version 2.2.0: mvn clean package -Dhadoop.version=2.2.0

I unit tested this for both versions and saw the tests passed, but I don't have 
access to a hadoop test environment currently, so could you guys test if this 
actually work (I'll do it tomorrow anyway)? 

Then we can commit it.

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908126#comment-13908126
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Yeah, you're right, edit coming.

Did you manage to run jobs against the cluster?

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908126#comment-13908126
 ] 

Gokhan Capan edited comment on MAHOUT-1329 at 2/21/14 9:59 AM:
---

Yeah, you're right, edit coming.

Did you manage to run jobs against the cluster [EDIT:Sorry I missed you 
mentioned that you ran the examples, great then]



was (Author: gokhancapan):
Yeah, you're right, edit coming.

Did you manage to run jobs against the cluster?

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908443#comment-13908443
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Good news that I tried that too, on a 2.2.0 cluster.
seqdir, seq2sparse, and kmeans worked without a problem.

I'm gonna wait till Monday to commit this, in case folks want to verify that it 
works.



 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907237#comment-13907237
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Hi Sergey, thank you for that, I am copying from MAHOUT-1354:

Gokhan: Looks like when hadoop-2 profile is activated, this patch fails to 
apply the hadoop-2 related dependencies to integration and examples modules, 
despite they are both dependent to core and core is dependent to hadoop-2. For 
me, moving hadoop dependencies to the root solved the problem, but I think we 
wouldn't want that since hadoop is not a common dependency for all modules of 
the project.

Ted: It is important to keep modules like mahout math free of the massive 
Hadoop dependency.

I think pushing dependencies to the root is not something that we desire I 
think, but let me look into this further.


 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Suneel Marthi
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokhan Capan updated MAHOUT-1329:
-

Attachment: 1329-3.patch

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Suneel Marthi
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2


[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907480#comment-13907480
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

Sergey, I modified your patch and produced a new version. Looking into the 
dependency tree, it seems it builds against the correct hadoop version.

(This may seem irrelevant when looking at the patch, but I had to set argLine 
to -Xmx1024m in order not the unit tests to fail because of an OOM)

for hadoop version 1.2.1: mvn clean package
for hadoop version 2.2.0: mvn clean package -Dhadoop.version=2.2.0

I unit tested this for both versions and saw the tests passed, but I don't have 
access to a hadoop test environment currently, so could you guys test if this 
actually work (I'll do it tomorrow anyway)? 

Then we can commit it.

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Suneel Marthi
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (MAHOUT-1329) Mahout for hadoop 2