[jira] [Commented] (MAHOUT-1919) Flink Module breaks the build regularly

2017-02-01 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849064#comment-15849064
 ] 

Suneel Marthi commented on MAHOUT-1919:
---

I suggest we remove Flink backend from the build path, its not entirely 
functional anyways and each upgrade Flink has been introducing new issues that 
break existing stuff. 

> Flink Module breaks the build regularly
> ---
>
> Key: MAHOUT-1919
> URL: https://issues.apache.org/jira/browse/MAHOUT-1919
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Priority: Critical
> Fix For: 0.13.0
>
>
> OOM Errors thrown by the flink module regularly break the Knightly Build.  
> These should be addressed before the 0.13.0 release... 
> One possibility is to Downgrade the Flink dependency in the root pom to the 
> orignal development dep: (1.0.x I believe)? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (MAHOUT-1882) SequentialAccessSparseVector inerateNonZeros is incorrect.

2017-01-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1882:
--
Comment: was deleted

(was: Is there a reproducible test case ? )

> SequentialAccessSparseVector inerateNonZeros is incorrect.
> --
>
> Key: MAHOUT-1882
> URL: https://issues.apache.org/jira/browse/MAHOUT-1882
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.13.0
>
>
> In {{SequentialAccessSparseVector}} a bug is noted.  When Cuonting Non-Zero 
> elements {{NonDefaultIterator}} can, under certain circumstances give an 
> incorrect iterator of size different from the actual non-zeroCounts.
> {code}
>  @Override
>   public Iterator iterateNonZero() {
> // TODO: this is a bug, since nonDefaultIterator doesn't hold to non-zero 
> contract.
> return new NonDefaultIterator();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1900) Add a getter to DenseMatrix for the double[][] values field.

2017-01-07 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1900.
---
Resolution: Implemented

> Add a getter to DenseMatrix for the double[][] values field.
> 
>
> Key: MAHOUT-1900
> URL: https://issues.apache.org/jira/browse/MAHOUT-1900
> Project: Mahout
>  Issue Type: New Feature
>  Components: Math
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> It may be possible to add something like {{getBackingDataStructure()}} all 
> the way up to {{AbstractMatrix}} and return Iterators for Sparse Matrices and 
> Arrays[][] for Dense Matrices.
> However, for no I think that we only need a getter for {{DenseMatrix}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1900) Add a getter to DenseMatrix for the double[][] values field.

2017-01-07 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1900 started by Suneel Marthi.
-
> Add a getter to DenseMatrix for the double[][] values field.
> 
>
> Key: MAHOUT-1900
> URL: https://issues.apache.org/jira/browse/MAHOUT-1900
> Project: Mahout
>  Issue Type: New Feature
>  Components: Math
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> It may be possible to add something like {{getBackingDataStructure()}} all 
> the way up to {{AbstractMatrix}} and return Iterators for Sparse Matrices and 
> Arrays[][] for Dense Matrices.
> However, for no I think that we only need a getter for {{DenseMatrix}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1901) Remove h20 from the Binary Release Build

2017-01-07 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1901.
---
Resolution: Fixed

> Remove h20 from the Binary Release Build
> 
>
> Key: MAHOUT-1901
> URL: https://issues.apache.org/jira/browse/MAHOUT-1901
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1901) Remove h20 from the Binary Release Build

2017-01-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1901 started by Suneel Marthi.
-
> Remove h20 from the Binary Release Build
> 
>
> Key: MAHOUT-1901
> URL: https://issues.apache.org/jira/browse/MAHOUT-1901
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (MAHOUT-1875) Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)

2016-12-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reopened MAHOUT-1875:
---

> Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)
> -
>
> Key: MAHOUT-1875
> URL: https://issues.apache.org/jira/browse/MAHOUT-1875
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> In {{sparkbindings.drm/package.blockify(...)}}, after testing the density of 
> an incoming block, use {{DenseMatrix(blockAsArrayOfDoubles, true)}} to 
> shallow copy the backing vector array into the {{DenseMatrix}}.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1875) Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)

2016-12-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1875.
---
Resolution: Fixed

> Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)
> -
>
> Key: MAHOUT-1875
> URL: https://issues.apache.org/jira/browse/MAHOUT-1875
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> In {{sparkbindings.drm/package.blockify(...)}}, after testing the density of 
> an incoming block, use {{DenseMatrix(blockAsArrayOfDoubles, true)}} to 
> shallow copy the backing vector array into the {{DenseMatrix}}.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1898) Mahout for parsing/analysing scanned medical images

2016-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1898.
---
   Resolution: Not A Problem
Fix Version/s: 0.13.0

> Mahout for parsing/analysing scanned medical images
> ---
>
> Key: MAHOUT-1898
> URL: https://issues.apache.org/jira/browse/MAHOUT-1898
> Project: Mahout
>  Issue Type: Question
>  Components: Classification, Clustering, Collaborative Filtering
>Reporter: Santhosh Kumar V S
> Fix For: 0.13.0
>
>
> We have a set of scaned images of medical records -this is unstructured data 
> - Our requirement is to parse these documents and store in a structured 
> format possibly in a 
> Relation DB or Schemaless DB like MongoDB
> How Mahout will help in this usecase ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1898) Mahout for parsing/analysing scanned medical images

2016-12-26 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779184#comment-15779184
 ] 

Suneel Marthi commented on MAHOUT-1898:
---

Firstly, this is a question that should have been posed on 
u...@mahout.apache.org and not meant to be a Jira.

So if I understand ur question right, u r looking to parse scanned images and 
persist them in a DB. Mahout is a Machine Learning library, not a Data 
transformation/processing pipeline.  

There are libraries out there that can read scanned images and convert them 
into pixels. This is not a ML issue, feel free to pose this question again 
u...@mahout.apache.org. 



> Mahout for parsing/analysing scanned medical images
> ---
>
> Key: MAHOUT-1898
> URL: https://issues.apache.org/jira/browse/MAHOUT-1898
> Project: Mahout
>  Issue Type: Question
>  Components: Classification, Clustering, Collaborative Filtering
>Reporter: Santhosh Kumar V S
> Fix For: 0.13.0
>
>
> We have a set of scaned images of medical records -this is unstructured data 
> - Our requirement is to parse these documents and store in a structured 
> format possibly in a 
> Relation DB or Schemaless DB like MongoDB
> How Mahout will help in this usecase ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1875) Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)

2016-12-25 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1875:
-

Assignee: Suneel Marthi

> Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)
> -
>
> Key: MAHOUT-1875
> URL: https://issues.apache.org/jira/browse/MAHOUT-1875
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> In {{sparkbindings.drm/package.blockify(...)}}, after testing the density of 
> an incoming block, use {{DenseMatrix(blockAsArrayOfDoubles, true)}} to 
> shallow copy the backing vector array into the {{DenseMatrix}}.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1882) SequentialAccessSparseVector inerateNonZeros is incorrect.

2016-12-25 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1882:
-

Assignee: Suneel Marthi

> SequentialAccessSparseVector inerateNonZeros is incorrect.
> --
>
> Key: MAHOUT-1882
> URL: https://issues.apache.org/jira/browse/MAHOUT-1882
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> In {{SequentialAccessSparseVector}} a bug is noted.  When Cuonting Non-Zero 
> elements {{NonDefaultIterator}} can, under certain circumstances give an 
> incorrect iterator of size different from the actual non-zeroCounts.
> {code}
>  @Override
>   public Iterator iterateNonZero() {
> // TODO: this is a bug, since nonDefaultIterator doesn't hold to non-zero 
> contract.
> return new NonDefaultIterator();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1767) Unable to run tests on H2O enigne in distributed mode

2016-12-25 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1767.
---
Resolution: Won't Fix

> Unable to run tests on H2O enigne in distributed mode
> -
>
> Key: MAHOUT-1767
> URL: https://issues.apache.org/jira/browse/MAHOUT-1767
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.11.0
>Reporter: Dmitry Yaraev
>Assignee: Andrew Palumbo
> Fix For: 1.0.0
>
>
> When one follows the instructions located in [README.md for H2O 
> module|https://github.com/apache/mahout/blob/master/h2o/README.md] and tries 
> to run tests in the distributed mode, tests run only in the local mode. There 
> are three steps in the instruction:
> # {code}
> host-1:~/mahout$ ./bin/mahout h2o-node
> ...
> .. INFO: Cloud of size 1 formed [/W.X.Y.Z:54321]
> {code}
> # {code}
> host-2:~/mahout$ ./bin/mahout h2o-node
> ...
> .. INFO: Cloud of size 2 formed [/A.B.C.D:54322]
> {code}
> # {code}
> host-N:~/mahout/h2o$ mvn test
> ...
> .. INFO: Cloud of size 3 formed [/E.F.G.H:54323]
> ...
> All tests passed.
> ...
> host-N:~/mahout/h2o$
> {code}
> First two steps are for executing worker nodes. The last one is for executing 
> tests. According to the instruction, after launching tests one more worker is 
> started. And it should join to the same cloud which other worker nodes forms. 
> But it does joined them because it has a different cloud name (or _masterURL_ 
> in terms of the code). If you look in the code, you can find the following:
> {code:title=DistributedH2OSuite.scala}
> ...
> mahoutCtx = mahoutH2OContext("mah2out" + System.currentTimeMillis())
> ...
> {code}
> After we removed the code which appends current time to cloud name, it 
> started to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1767) Unable to run tests on H2O enigne in distributed mode

2016-12-25 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1517#comment-1517
 ] 

Suneel Marthi commented on MAHOUT-1767:
---

H2O is not being supported now and we have not seen anyone even attempting it, 
will resolve this as 'Won't Fix'

> Unable to run tests on H2O enigne in distributed mode
> -
>
> Key: MAHOUT-1767
> URL: https://issues.apache.org/jira/browse/MAHOUT-1767
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.11.0
>Reporter: Dmitry Yaraev
>Assignee: Andrew Palumbo
> Fix For: 1.0.0
>
>
> When one follows the instructions located in [README.md for H2O 
> module|https://github.com/apache/mahout/blob/master/h2o/README.md] and tries 
> to run tests in the distributed mode, tests run only in the local mode. There 
> are three steps in the instruction:
> # {code}
> host-1:~/mahout$ ./bin/mahout h2o-node
> ...
> .. INFO: Cloud of size 1 formed [/W.X.Y.Z:54321]
> {code}
> # {code}
> host-2:~/mahout$ ./bin/mahout h2o-node
> ...
> .. INFO: Cloud of size 2 formed [/A.B.C.D:54322]
> {code}
> # {code}
> host-N:~/mahout/h2o$ mvn test
> ...
> .. INFO: Cloud of size 3 formed [/E.F.G.H:54323]
> ...
> All tests passed.
> ...
> host-N:~/mahout/h2o$
> {code}
> First two steps are for executing worker nodes. The last one is for executing 
> tests. According to the instruction, after launching tests one more worker is 
> started. And it should join to the same cloud which other worker nodes forms. 
> But it does joined them because it has a different cloud name (or _masterURL_ 
> in terms of the code). If you look in the code, you can find the following:
> {code:title=DistributedH2OSuite.scala}
> ...
> mahoutCtx = mahoutH2OContext("mah2out" + System.currentTimeMillis())
> ...
> {code}
> After we removed the code which appends current time to cloud name, it 
> started to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1750) Mahout DSL for Flink: Implement ABt

2016-12-25 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1750.
---
Resolution: Won't Fix

> Mahout DSL for Flink: Implement ABt
> ---
>
> Key: MAHOUT-1750
> URL: https://issues.apache.org/jira/browse/MAHOUT-1750
> Project: Mahout
>  Issue Type: Task
>  Components: Flink, Math
>Affects Versions: 0.10.2
>Reporter: Alexey Grigorev
>Priority: Minor
> Fix For: 1.0.0
>
>
> Now ABt is expressed through AtB, which is not optimal, and we need to have a 
> special implementation for ABt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1750) Mahout DSL for Flink: Implement ABt

2016-12-25 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1750:
--
Fix Version/s: (was: 1.0.0)
   0.13.0

> Mahout DSL for Flink: Implement ABt
> ---
>
> Key: MAHOUT-1750
> URL: https://issues.apache.org/jira/browse/MAHOUT-1750
> Project: Mahout
>  Issue Type: Task
>  Components: Flink, Math
>Affects Versions: 0.10.2
>Reporter: Alexey Grigorev
>Priority: Minor
> Fix For: 0.13.0
>
>
> Now ABt is expressed through AtB, which is not optimal, and we need to have a 
> special implementation for ABt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1894) Add support for Spark 2x backend

2016-12-12 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1894:
-

 Summary: Add support for Spark 2x backend
 Key: MAHOUT-1894
 URL: https://issues.apache.org/jira/browse/MAHOUT-1894
 Project: Mahout
  Issue Type: Task
  Components: spark
Affects Versions: 0.12.0
Reporter: Suneel Marthi
 Fix For: 1.0.0


add support for Spark 2.x as backend execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1855) Zeppelin integration: Visualization

2016-11-10 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654334#comment-15654334
 ] 

Suneel Marthi commented on MAHOUT-1855:
---

Can this be marked 'Resolved'? The fix is in 0.12.2 release.

> Zeppelin integration: Visualization
> ---
>
> Key: MAHOUT-1855
> URL: https://issues.apache.org/jira/browse/MAHOUT-1855
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.12.0
>Reporter: Andrew Palumbo
>Assignee: Trevor Grant
> Fix For: 0.13.0
>
>
> Integrate Mahout and Zeppelin's visualization features. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1758) mahout spark-shell - get illegal acces error at startup

2016-11-09 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651683#comment-15651683
 ] 

Suneel Marthi commented on MAHOUT-1758:
---

I don't see an error in ur message, what u r seeing is a Warning and u should 
still be good to go.

> mahout spark-shell - get illegal acces error at startup
> ---
>
> Key: MAHOUT-1758
> URL: https://issues.apache.org/jira/browse/MAHOUT-1758
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.10.1
> Environment: linux unbuntu 14.04,  cluster 1pc master 2pc slave, 16GB 
> ram by node.
> Hadoop 2.6
> Spark 1.4.1
> Mahout 10.1
> R 3.0.2/Rhadoop
> scala 2.10
>Reporter: JP Bordenave
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.11.0
>
>
> Hello,
> i installed hadoop 2.6,  spark 1.4 ,sparkR,pyspark working fine, no issue
> scala 2.10
> now i try to configure mahout with my cluster spark/hadoop, but when i start
> mahout, i get illegalaccesseror, it try tot start in local mode, i get same 
> error, look to be incompatible with spark 1.4.x and mahout 10.1, 
> can you confirm ? patch ?
> edit: i saw in release note mahout 10.1, compatibilty 1.2.2 or less
> Thanks for your info
> JP
> i set my variable nd my cluster spark
> export SPARK_HOME=/usr/local/spark
> export MASTER=spark://stargate:7077
> {noformat}
> hduser@stargate:~$ mahout spark-shell
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/apache-mahout-distribution-0.10.1/mahout-examples-0.10.1-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/apache-mahout-distribution-0.10.1/mahout-mr-0.10.1-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/spark-1.4.1-bin-hadoop2.6/lib/spark-assembly-1.4.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/apache-mahout-distribution-0.10.1/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 15/07/17 23:17:54 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
>  _ _
>  _ __ ___   __ _| |__   ___  _   _| |_
> | '_ ` _ \ / _` | '_ \ / _ \| | | | __|
> | | | | | | (_| | | | | (_) | |_| | |_
> |_| |_| |_|\__,_|_| |_|\___/ \__,_|\__|  version 0.10.0
> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_79)
> Type in expressions to have them evaluated.
> Type :help for more information.
> java.lang.IllegalAccessError: tried to access method 
> org.apache.spark.repl.SparkIMain.classServer()Lorg/apache/spark/HttpServer; 
> from class org.apache.mahout.sparkbindings.shell.MahoutSparkILoop
> at 
> org.apache.mahout.sparkbindings.shell.MahoutSparkILoop.createSparkContext(MahoutSparkILoop.scala:42)
> at $iwC$$iwC.(:11)
> at $iwC.(:18)
> at (:20)
> at .(:24)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
> at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
> at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at 
> org.apache.mahout.sparkbindings.shell.MahoutSparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(MahoutSparkILoop.scala:63)
> at 
> org.apache.mahout.sparkbindings.shell.MahoutSparkILoop$$anonfun$initializeSpark$1.apply(MahoutSparkILoop.scala:62)
> at 
> org.apache.mahout.sparkbindings.shell.MahoutSparkILoop$$anonfun$initializeSpark$1.apply(MahoutSparkILoop.scala:62)
> at 
> 

[jira] [Commented] (MAHOUT-1889) Mahout doesn't work with Spark 2.0

2016-10-26 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608860#comment-15608860
 ] 

Suneel Marthi commented on MAHOUT-1889:
---

We don't support Spark 2.x yet and yes the Mahout spark shell had to be tweaked 
with each Spark upgrade, in large part thanks to the Spark  community for 
incompatible and breaking changes that come with each Spark release. 

Its not on the immediate roadmap to fix this, and its the least priority at the 
moment for the project to be supporting Spark 2.x. Most Mahout users are still 
on Spark 1.5x or 1.6x, so it doesn't make sense moving to Spark 2.x. 

Feel free to submit a PR for Spark 2x support nevertheless.

> Mahout doesn't work with Spark 2.0
> --
>
> Key: MAHOUT-1889
> URL: https://issues.apache.org/jira/browse/MAHOUT-1889
> Project: Mahout
>  Issue Type: Bug
>Reporter: Sergey Svinarchuk
>
> In Spark 2.0 was changes path to libraries and classpath. If change classpath 
> to correct for Spark 2.0, all Spark job failed with 
> java.lang.NoSuchMethodError: , because Spark API was changed.
> Example for spark-shell:
> {code}
> ./bin/mahout spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.spark.repl.SparkILoop.setPrompt(Ljava/lang/String;)V
>   at 
> org.apache.mahout.sparkbindings.shell.MahoutSparkILoop.(MahoutSparkILoop.scala:58)
>   at org.apache.mahout.sparkbindings.shell.Main$.main(Main.scala:32)
>   at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1885) Inital Implementation of VCL Bindings

2016-10-17 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582075#comment-15582075
 ] 

Suneel Marthi commented on MAHOUT-1885:
---

This has been fixed as of today in ViennaCL. Please update ur ViennaCL 
binaries. 

> Inital Implementation of VCL Bindings
> -
>
> Key: MAHOUT-1885
> URL: https://issues.apache.org/jira/browse/MAHOUT-1885
> Project: Mahout
>  Issue Type: Improvement
>  Components: Math
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.13.0
>
>
> Push a working experimental branch of VCL bindings into master.  There is 
> still a lot of work to be done.  All tests are passing, At the moment there 
> am opening this JIRA mostly to get a number for PR and to test profiles 
> against on travis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1889) Mahout doesn't work with Spark 2.0

2016-10-17 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582071#comment-15582071
 ] 

Suneel Marthi commented on MAHOUT-1889:
---

Thanks for reporting this, it makes sense to work on this if any of the Hadoop 
vendors are already packaging Spark 2x in their distros. Feel free to submit a 
patch nevertheless. 

> Mahout doesn't work with Spark 2.0
> --
>
> Key: MAHOUT-1889
> URL: https://issues.apache.org/jira/browse/MAHOUT-1889
> Project: Mahout
>  Issue Type: Bug
>Reporter: Sergey Svinarchuk
>
> In Spark 2.0 was changes path to libraries and classpath. If change classpath 
> to correct for Spark 2.0, all Spark job failed with 
> java.lang.NoSuchMethodError: , because Spark API was changed.
> Example for spark-shell:
> {code}
> ./bin/mahout spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.spark.repl.SparkILoop.setPrompt(Ljava/lang/String;)V
>   at 
> org.apache.mahout.sparkbindings.shell.MahoutSparkILoop.(MahoutSparkILoop.scala:58)
>   at org.apache.mahout.sparkbindings.shell.Main$.main(Main.scala:32)
>   at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1882) SequentialAccessSparseVector inerateNonZeros is incorrect.

2016-10-14 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576800#comment-15576800
 ] 

Suneel Marthi commented on MAHOUT-1882:
---

Is there a reproducible test case ? 

> SequentialAccessSparseVector inerateNonZeros is incorrect.
> --
>
> Key: MAHOUT-1882
> URL: https://issues.apache.org/jira/browse/MAHOUT-1882
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
> Fix For: 0.13.0
>
>
> In {{SequentialAccessSparseVector}} a bug is noted.  When Cuonting Non-Zero 
> elements {{NonDefaultIterator}} can, under certain circumstances give an 
> incorrect iterator of size different from the actual non-zeroCounts.
> {code}
>  @Override
>   public Iterator iterateNonZero() {
> // TODO: this is a bug, since nonDefaultIterator doesn't hold to non-zero 
> contract.
> return new NonDefaultIterator();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-14 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1884:
--
Fix Version/s: 0.13.0

> Allow specification of dimensions of a DRM
> --
>
> Key: MAHOUT-1884
> URL: https://issues.apache.org/jira/browse/MAHOUT-1884
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Sebastian Schelter
>Assignee: Sebastian Schelter
>Priority: Minor
> Fix For: 0.13.0
>
>
> Currently, in many cases, a DRM must be read to compute its dimensions when a 
> user calls nrow or ncol. This also implicitly caches the corresponding DRM.
> In some cases, the user actually knows the matrix dimensions (e.g., when the 
> matrices are synthetically generated, or when some metadata about them is 
> known). In such cases, the user should be able to specify the dimensions upon 
> creating the DRM and the caching should be avoided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1883) Create a type if IndexedDataset that filters unneeded data for CCO

2016-10-13 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574067#comment-15574067
 ] 

Suneel Marthi commented on MAHOUT-1883:
---

[~pferrel] can this resolved as 'Fixed' ?

> Create a type if IndexedDataset that filters unneeded data for CCO
> --
>
> Key: MAHOUT-1883
> URL: https://issues.apache.org/jira/browse/MAHOUT-1883
> Project: Mahout
>  Issue Type: New Feature
>  Components: Collaborative Filtering
>Affects Versions: 0.13.0
>Reporter: Pat Ferrel
>Assignee: Pat Ferrel
> Fix For: 0.13.0
>
>
> The collaborative filtering CCO algo uses drms for each "indicator" type. The 
> input must have the same set of user-id and so the row rank for all input 
> matrices must be the same.
> In the past we have padded the row-id dictionary to include new rows only in 
> secondary matrices. This can lead to very large amounts of data processed in 
> the CCO pipeline that does not affect the results. Put another way if the row 
> doesn't exist in the primary matrix, there will be no cross-occurrence in the 
> other calculated cooccurrences matrix.
> if we are calculating P'P and P'S, S will not need rows that don't exist in P 
> so this Jira is to create an IndexedDataset companion object that takes an 
> RDD[(String, String)] of interactions but that uses the dictionary from P for 
> row-ids and filters out all data that doesn't correspond to P. The companion 
> object will create the row-ids dictionary if it is not passed in, and use it 
> to filter if it is passed in.
> We have seen data that can be reduced by many orders of magnitude using this 
> technique. This could be handled outside of Mahout but always produces better 
> performance and so this version of data-prep seems worth including.
> It does not affect the CLI version yet but could be included there in a 
> future Jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

2016-10-13 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574059#comment-15574059
 ] 

Suneel Marthi commented on MAHOUT-1788:
---

Is someone still working on this? 

> spark-itemsimilarity integration test script cleanup
> 
>
> Key: MAHOUT-1788
> URL: https://issues.apache.org/jira/browse/MAHOUT-1788
> Project: Mahout
>  Issue Type: Improvement
>  Components: cooccurrence
>Affects Versions: 0.11.0
>Reporter: Pat Ferrel
>Assignee: Pat Ferrel
>Priority: Trivial
> Fix For: 1.0.0
>
>
> binary release does not contain data for itemsimilarity tests, neith binary 
> nor source versions will run on a cluster unless data is hand copied to hdfs.
> Clean this up so it copies data if needed and the data is in both versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1871) Kmeans - java.lang.IllegalStateException: No input clusters found..... Check your -c argument

2016-10-13 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1871.
---
Resolution: Not A Bug

U need to provide initial centroids - IIRC u can either provide the initial 
centroids or provide a folder with -c option to generate random initial 
centroids . You may want to check the examples/bin/cluster-reuters.sh KMeans to 
see how its being done.

This is a question and not a bug, please pose these questions on user@mahout . 

> Kmeans - java.lang.IllegalStateException: No input clusters found. Check 
> your -c argument
> -
>
> Key: MAHOUT-1871
> URL: https://issues.apache.org/jira/browse/MAHOUT-1871
> Project: Mahout
>  Issue Type: Question
>  Components: Clustering
>Affects Versions: 0.12.1
> Environment: S.O. Centos 6.5
> hadoop 2.7.2
>Reporter: Juan Carlos Sipan Robles
>Priority: Critical
> Fix For: 0.13.0
>
>
> By using the kmeans with the following parameters gives the following error.
> 16/06/12 17:35:43 INFO KMeansDriver: convergence: 0.5 max Iterations: 10
> 16/06/12 17:35:43 INFO CodecPool: Got brand-new decompressor [.deflate]
> Exception in thread "main" java.lang.IllegalStateException: No input clusters 
> found in /mdb/clustered_data/part-randomSeed. Check your -c argument.
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:213)
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:110)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
>   at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> [SSH] exit-status: 1
> Finished: FAILURE
> Command Execution:
> hdfs dfs -rm -R /mdb/mahout_vectors/
> hdfs dfs -rm -R /mdb/mahout_seq/
> hdfs dfs -rm -R /mdb/mahout_data/
> hdfs dfs -rm -R /mdb/clustered_data/
> echo # SE ELIMINAN LAS CARPETAS DE HDFS#
> hdfs dfs -mkdir /mdb/mahout_vectors/
> hdfs dfs -mkdir /mdb/mahout_seq/
> hdfs dfs -mkdir /mdb/mahout_data/
> hdfs dfs -mkdir /mdb/clustered_data/
> echo # subimos el fichero #
> hdfs dfs -put $fichero /mdb/mahout_data/
> echo # generamos ficheros secuenciales#
> mahout seqdirectory -i /mdb/mahout_data/ -o /mdb/mahout_seq -c UTF-8 -chunk 
> 64 -xm sequential
> echo # generamos los vectores #
> mahout seq2sparse -i /mdb/mahout_seq/ -o /mdb/mahout_vectors/ --namedVector
> echo # ejecutamos el kmeans #
> mahout kmeans -i /mdb/mahout_vectors/tfidf-vectors/ -c /mdb/clustered_data -o 
> /mdb/mahout_data -dm 
> org.apache.mahout.common.distance.EuclideanDistanceMeasure -x 10 -k 20 -ow 
> --clustering



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1888) Performance Bug with Mahout Vector Serialization

2016-10-13 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1888.
---
Resolution: Fixed

> Performance Bug with Mahout Vector Serialization
> 
>
> Key: MAHOUT-1888
> URL: https://issues.apache.org/jira/browse/MAHOUT-1888
> Project: Mahout
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.12.2
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Identified a performance bug with Mahout Vector serialization in 
> DistributedSparkSuite.
> Add the following
> {Code}
> .set("spark.kryo.registrationRequired", "true")
> {Code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1888) Performance Bug with Mahout Vector Serialization

2016-10-12 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1888 started by Suneel Marthi.
-
> Performance Bug with Mahout Vector Serialization
> 
>
> Key: MAHOUT-1888
> URL: https://issues.apache.org/jira/browse/MAHOUT-1888
> Project: Mahout
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.12.2
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Identified a performance bug with Mahout Vector serialization in 
> DistributedSparkSuite.
> Add the following
> {Code}
> .set("spark.kryo.registrationRequired", "true")
> {Code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1888) Performance Bug with Mahout Vector Serialization

2016-10-09 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1888:
-

Assignee: Suneel Marthi

> Performance Bug with Mahout Vector Serialization
> 
>
> Key: MAHOUT-1888
> URL: https://issues.apache.org/jira/browse/MAHOUT-1888
> Project: Mahout
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.12.2
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Identified a performance bug with Mahout Vector serialization in 
> DistributedSparkSuite.
> Add the following
> {Code}
> .set("spark.kryo.registrationRequired", "true")
> {Code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1888) Performance Bug with Mahout Vector Serialization

2016-10-09 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1888:
-

 Summary: Performance Bug with Mahout Vector Serialization
 Key: MAHOUT-1888
 URL: https://issues.apache.org/jira/browse/MAHOUT-1888
 Project: Mahout
  Issue Type: Bug
  Components: spark
Affects Versions: 0.12.2
Reporter: Suneel Marthi
 Fix For: 0.13.0


Identified a performance bug with Mahout Vector serialization in 
DistributedSparkSuite.

Add the following

{Code}
.set("spark.kryo.registrationRequired", "true")
{Code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1582) Create simpler row and column aggregation API at local level

2016-10-09 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1582.
---
   Resolution: Won't Fix
Fix Version/s: 0.13.0

Resolving this as 'Won't Fix', please feel free to create a new Jira 

> Create simpler row and column aggregation API at local level
> 
>
> Key: MAHOUT-1582
> URL: https://issues.apache.org/jira/browse/MAHOUT-1582
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Suneel Marthi
>  Labels: legacy, math, scala
> Fix For: 0.13.0
>
>
> The issue is that the current row and column aggregation API makes it 
> difficult to do anything but row by row aggregation using anonymous classes.  
> There is no scope for being aware of locality, nor to use the well known 
> function definitions in Functions.  This makes lots of optimizations 
> impossible and many of these are optimizations that we want to have.  An 
> example would be adding up absolute values of values.  With the current API, 
> it would be very hard to optimize for sparse matrices and the wrong direction 
> of iteration but with a different API, this should be easy.
> What I suggest is an API of this form:
> {code}
>Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
> {code}
> This will produce a vector with one element per row in the original.  The 
> nice thing here is that if the matrix is row major, we can iterate over rows 
> and accumulate a value for each row using sparsity as available.  On the 
> other hand, if the matrix is column major, we can keep a vector of 
> accumulators and still use sparsity as appropriate.  
> The use of sparsity comes in because the matrix code now has control over 
> both of the loops involved and also has visibility into properties of the map 
> and combine functions.  For instance, ABS(0) == 0 so if we combine with PLUS, 
> we can use a sparse iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1830) Publish scaladocs for Mahout 0.12.0 release

2016-10-09 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560789#comment-15560789
 ] 

Suneel Marthi commented on MAHOUT-1830:
---

If someone would like to work on this, please feel free to reach out. 

> Publish scaladocs for Mahout 0.12.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Priority: Critical
>  Labels: Newbie
> Fix For: 0.13.0
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1830) Publish scaladocs for Mahout 0.12.0 release

2016-10-09 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1830:
--
Assignee: (was: Suneel Marthi)

> Publish scaladocs for Mahout 0.12.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Priority: Critical
>  Labels: Newbie
> Fix For: 0.13.0
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1887) Document org.apache.mahout.classifier.sgd.RunLogistic

2016-10-09 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560786#comment-15560786
 ] 

Suneel Marthi commented on MAHOUT-1887:
---

Hey Karl, the input is specified via the command-line argument. Would u want 
that to be explicitly specified in the code comments ? Feel free to make a PR. 

> Document org.apache.mahout.classifier.sgd.RunLogistic
> -
>
> Key: MAHOUT-1887
> URL: https://issues.apache.org/jira/browse/MAHOUT-1887
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Karl Richter
>
> It'd be nice to know from the class comment where to get input data from for 
> `org.apache.mahout.classifier.sgd.RunLogistic` of the examples module.
> experienced with mahout-0.12.2-24-gb5fe4aa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1626) Support for required quasi-algebraic operations and starting with aggregating rows/blocks

2016-09-12 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1626.
---
Resolution: Won't Fix

> Support for required quasi-algebraic operations and starting with aggregating 
> rows/blocks
> -
>
> Key: MAHOUT-1626
> URL: https://issues.apache.org/jira/browse/MAHOUT-1626
> Project: Mahout
>  Issue Type: New Feature
>  Components: Math
>Affects Versions: 0.10.0
>Reporter: Gokhan Capan
>Assignee: Gokhan Capan
>  Labels: DSL, scala, spark
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1881) flink-config.yaml is not copied to $MAHOUT_HOME/conf in Binary Distro

2016-09-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1881 started by Suneel Marthi.
-
> flink-config.yaml is not copied to $MAHOUT_HOME/conf in Binary Distro 
> --
>
> Key: MAHOUT-1881
> URL: https://issues.apache.org/jira/browse/MAHOUT-1881
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.13.0
>
>
> The {{$MAHOUT_HOME/flink-config.yaml}} file which is used to configure the 
> Flink degree of parallesm, number of tasks, and temp caching directory is not 
> included in the {{$MAHOUT_HOME/conf}} in the binary distribution.  It seems 
> like the whole directory is written over with mr .props files during the 
> release process.  
> The file exists in the source repository. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1881) flink-config.yaml is not copied to $MAHOUT_HOME/conf in Binary Distro

2016-09-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1881:
-

Assignee: Suneel Marthi

> flink-config.yaml is not copied to $MAHOUT_HOME/conf in Binary Distro 
> --
>
> Key: MAHOUT-1881
> URL: https://issues.apache.org/jira/browse/MAHOUT-1881
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.13.0
>
>
> The {{$MAHOUT_HOME/flink-config.yaml}} file which is used to configure the 
> Flink degree of parallesm, number of tasks, and temp caching directory is not 
> included in the {{$MAHOUT_HOME/conf}} in the binary distribution.  It seems 
> like the whole directory is written over with mr .props files during the 
> release process.  
> The file exists in the source repository. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MAHOUT-1818) dals test failing in Flink-bindings

2016-09-06 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469512#comment-15469512
 ] 

Suneel Marthi edited comment on MAHOUT-1818 at 9/7/16 4:44 AM:
---

Could we mark this as resolved with 'Will not Fix' as resolution ? 
[~Andrew_Palumbo]


was (Author: smarthi):
Could we mark this resolved ? [~Andrew_Palumbo]

> dals test failing in Flink-bindings
> ---
>
> Key: MAHOUT-1818
> URL: https://issues.apache.org/jira/browse/MAHOUT-1818
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.11.2
>Reporter: Andrew Palumbo
> Fix For: 1.0.0
>
>
> {{dals}} test fails in Flink bindings with an OOM.  Numerically the test 
> passes, when the matrix being decomposed in the test  lowered to the size 50 
> x 50.  But the default size of the matrix in the 
> {{DistributedDecompositionsSuiteBase}} is 500 x 500. 
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:2271)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.writeBlockHeader(ObjectOutputStream.java:1893)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1874)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:300)
>   at 
> org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:252)
>   at 
> org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:273)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:893)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:286)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:109)
>   at 
> org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:86)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:128)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:188)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1831) Integrate Flink Shell with Mahout for interactive Samsara

2016-09-06 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469527#comment-15469527
 ] 

Suneel Marthi commented on MAHOUT-1831:
---

I would close this issue as 'Will not Implement' for now. I don't think we'll 
ever see any work being done on the Flink Shell in the Flink project.

> Integrate Flink Shell with Mahout for interactive Samsara
> -
>
> Key: MAHOUT-1831
> URL: https://issues.apache.org/jira/browse/MAHOUT-1831
> Project: Mahout
>  Issue Type: New Feature
>  Components: Flink
>Affects Versions: 0.12.0
>Reporter: Andrew Palumbo
> Fix For: 1.0.0
>
>
> Integrate Flink Shell with Mahout to be able to perform interactive Samsara, 
> similar to what's presently being done with Spark shell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1864) Twenty Newsgroups Classification Example fails in case running with MAHOUT_LOCAL=true

2016-09-06 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469520#comment-15469520
 ] 

Suneel Marthi commented on MAHOUT-1864:
---

I would vote to resolve this jira as 'Won't Fix' .  We have ceased support for 
this since 0.9 release. 

> Twenty Newsgroups Classification Example fails in case running with 
> MAHOUT_LOCAL=true 
> --
>
> Key: MAHOUT-1864
> URL: https://issues.apache.org/jira/browse/MAHOUT-1864
> Project: Mahout
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 0.12.0
>Reporter: giriraj sharma
>Priority: Minor
>  Labels: easyfix, easytest, newbie
> Fix For: 0.13.0
>
>
> Twenty Newsgroups Classification Example fails in case running with 
> {{MAHOUT_LOCAL=true}} or else when {{HADOOP_HOME}} env variable is not set.
> [Newsgroups|https://mahout.apache.org/users/classification/twenty-newsgroups.html]
>  lists instructions in order to run this classifier. When running in 
> standalone mode({{MAHOUT_LOCAL=true}}), i.e., running {{$ 
> ./examples/bin/classify-20newsgroups.sh}}, the script runs 
> {{./examples/bin/set-dfs-commands.sh}} internally to export hadoop related 
> env variables.
> {{set-dfs-commands.sh}} attempts to check for hadoop version despite running 
> with {{MAHOUT_LOCAL}} set as true. IMHO, the script works fine considering 
> the prerequisites, but, it will as well make sense if we can update the 
> script {{./examples/bin/set-dfs-commands.sh}} to export hadoop env varibales 
> only in case {{MAHOUT_LOCAL}} is not set to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1818) dals test failing in Flink-bindings

2016-09-06 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469512#comment-15469512
 ] 

Suneel Marthi commented on MAHOUT-1818:
---

Could we mark this resolved ? [~Andrew_Palumbo]

> dals test failing in Flink-bindings
> ---
>
> Key: MAHOUT-1818
> URL: https://issues.apache.org/jira/browse/MAHOUT-1818
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.11.2
>Reporter: Andrew Palumbo
> Fix For: 1.0.0
>
>
> {{dals}} test fails in Flink bindings with an OOM.  Numerically the test 
> passes, when the matrix being decomposed in the test  lowered to the size 50 
> x 50.  But the default size of the matrix in the 
> {{DistributedDecompositionsSuiteBase}} is 500 x 500. 
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:2271)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.writeBlockHeader(ObjectOutputStream.java:1893)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1874)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>   at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:300)
>   at 
> org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:252)
>   at 
> org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:273)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:893)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:286)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:109)
>   at 
> org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:86)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:128)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:188)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1872) Implementation Issue of getting number of users in ParallelALSFactorizationJob.java

2016-09-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1872:
--
Fix Version/s: (was: 0.12.2)
   (was: 0.12.1)

> Implementation Issue of getting number of users in 
> ParallelALSFactorizationJob.java
> ---
>
> Key: MAHOUT-1872
> URL: https://issues.apache.org/jira/browse/MAHOUT-1872
> Project: Mahout
>  Issue Type: Bug
>  Components: Collaborative Filtering
>Affects Versions: 1.0.0, 0.12.1, 0.13.0, 0.12.2
> Environment: Ubuntu-14.04, Apache Hadoop-2.6.0, Java-1.8
>Reporter: Tarun Gulyani
>  Labels: easyfix, patch
> Fix For: 1.0.0, 0.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Code "int numUsers = (int) 
> userRatings.getCounters().findCounter(Stats.NUM_USERS).getValue();" is 
> calling in "ParallelALSFactorizationJob.java" after completion of "average 
> rating" job which is called after "user rating" job. Therefor JobClient not 
> able to get the information of number of user directly from Application 
> Master and to get this information it redirect to Job History Server. 
> Therefore ALS job able to run successfully unless Job History Server is 
> running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1871) Kmeans - java.lang.IllegalStateException: No input clusters found..... Check your -c argument

2016-09-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1871:
--
Fix Version/s: (was: 0.12.1)
   0.13.0

> Kmeans - java.lang.IllegalStateException: No input clusters found. Check 
> your -c argument
> -
>
> Key: MAHOUT-1871
> URL: https://issues.apache.org/jira/browse/MAHOUT-1871
> Project: Mahout
>  Issue Type: Question
>  Components: Clustering
>Affects Versions: 0.12.1
> Environment: S.O. Centos 6.5
> hadoop 2.7.2
>Reporter: Juan Carlos Sipan Robles
>Priority: Critical
> Fix For: 0.13.0
>
>
> By using the kmeans with the following parameters gives the following error.
> 16/06/12 17:35:43 INFO KMeansDriver: convergence: 0.5 max Iterations: 10
> 16/06/12 17:35:43 INFO CodecPool: Got brand-new decompressor [.deflate]
> Exception in thread "main" java.lang.IllegalStateException: No input clusters 
> found in /mdb/clustered_data/part-randomSeed. Check your -c argument.
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:213)
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:110)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
>   at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> [SSH] exit-status: 1
> Finished: FAILURE
> Command Execution:
> hdfs dfs -rm -R /mdb/mahout_vectors/
> hdfs dfs -rm -R /mdb/mahout_seq/
> hdfs dfs -rm -R /mdb/mahout_data/
> hdfs dfs -rm -R /mdb/clustered_data/
> echo # SE ELIMINAN LAS CARPETAS DE HDFS#
> hdfs dfs -mkdir /mdb/mahout_vectors/
> hdfs dfs -mkdir /mdb/mahout_seq/
> hdfs dfs -mkdir /mdb/mahout_data/
> hdfs dfs -mkdir /mdb/clustered_data/
> echo # subimos el fichero #
> hdfs dfs -put $fichero /mdb/mahout_data/
> echo # generamos ficheros secuenciales#
> mahout seqdirectory -i /mdb/mahout_data/ -o /mdb/mahout_seq -c UTF-8 -chunk 
> 64 -xm sequential
> echo # generamos los vectores #
> mahout seq2sparse -i /mdb/mahout_seq/ -o /mdb/mahout_vectors/ --namedVector
> echo # ejecutamos el kmeans #
> mahout kmeans -i /mdb/mahout_vectors/tfidf-vectors/ -c /mdb/clustered_data -o 
> /mdb/mahout_data -dm 
> org.apache.mahout.common.distance.EuclideanDistanceMeasure -x 10 -k 20 -ow 
> --clustering



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1865) Remove Hadoop 1 support.

2016-09-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1865.
---
Resolution: Fixed

> Remove Hadoop 1 support.
> 
>
> Key: MAHOUT-1865
> URL: https://issues.apache.org/jira/browse/MAHOUT-1865
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Remove support for Hadoop 1.
> 1. Disable Jenkins Hadoop 1 build.
> 2. Remove Hadoop 1 profile from the root {{pom.xml}}.
> 3. Refactor any Hadoop1 specific code *outside of* the {{mr}} module.
> 4. Update documentation.
> 5. Notify user@.
> Hadoop1HDFSUtils is likely the only code that this will affect.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1830) Publish scaladocs for Mahout 0.12.0 release

2016-09-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1830:
--
Labels: Newbie  (was: )

> Publish scaladocs for Mahout 0.12.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Critical
>  Labels: Newbie
> Fix For: 0.13.0
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1880) Remove H2O Bindings from the release binaries

2016-09-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1880.
---
Resolution: Fixed

> Remove H2O Bindings from the release binaries 
> --
>
> Key: MAHOUT-1880
> URL: https://issues.apache.org/jira/browse/MAHOUT-1880
> Project: Mahout
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.13.0
>
>
> Since the H2O bindings are very large (~20m) we will no longer continue to 
> ship them in the binary release.  They will be continue available through  
> source builds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1865) Remove Hadoop 1 support.

2016-09-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1865 started by Suneel Marthi.
-
> Remove Hadoop 1 support.
> 
>
> Key: MAHOUT-1865
> URL: https://issues.apache.org/jira/browse/MAHOUT-1865
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Remove support for Hadoop 1.
> 1. Disable Jenkins Hadoop 1 build.
> 2. Remove Hadoop 1 profile from the root {{pom.xml}}.
> 3. Refactor any Hadoop1 specific code *outside of* the {{mr}} module.
> 4. Update documentation.
> 5. Notify user@.
> Hadoop1HDFSUtils is likely the only code that this will affect.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1880) Remove H2O Bindings from the release binaries

2016-09-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1880:
-

Assignee: Suneel Marthi

> Remove H2O Bindings from the release binaries 
> --
>
> Key: MAHOUT-1880
> URL: https://issues.apache.org/jira/browse/MAHOUT-1880
> Project: Mahout
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.13.0
>
>
> Since the H2O bindings are very large (~20m) we will no longer continue to 
> ship them in the binary release.  They will be continue available through  
> source builds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1876) Mahout fails to read from lucene index of solr-5.5.2

2016-08-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1876:
--
Environment: 
Solr: 5.5.2
JDK: 1.7.0
Mahout: 0.12.2
OS: Linux

  was:
Solr: 6.1.0
JDK: 1.8.0_92
Mahout: 0.12.2
OS: Linux


> Mahout fails to read from lucene index of solr-5.5.2
> 
>
> Key: MAHOUT-1876
> URL: https://issues.apache.org/jira/browse/MAHOUT-1876
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
> Environment: Solr: 5.5.2
> JDK: 1.7.0
> Mahout: 0.12.2
> OS: Linux
>Reporter: Raviteja Lokineni
> Fix For: 0.13.0
>
>
> Command: {noformat}bin/mahout lucene.vector --dir 
> ~/softwares/solr-6.1.0/server/solr/nlp-core/data/index --output 
> /tmp/solr-nlp-core/out.vec --field rspns_val --dictOut 
> /tmp/solr-nlp-core/dictionary.txt --norm 2{noformat}
> Stacktrace:
> {noformat}
> hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running 
> locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-examples-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-mr-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/lib/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource: 
> ChecksumIndexInput(MMapIndexInput(path="/home/lok268/softwares/solr-6.1.0/server/solr/nlp-core/data/index/segments_2"))):
>  6 (needs to be between 0 and 1)
> at 
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:148)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:329)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
> at 
> org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:89)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:277)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1876) Mahout fails to read from lucene index of solr-6.1.0

2016-08-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1876.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Merged. Thanks again for the contribution.

> Mahout fails to read from lucene index of solr-6.1.0
> 
>
> Key: MAHOUT-1876
> URL: https://issues.apache.org/jira/browse/MAHOUT-1876
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
> Environment: Solr: 6.1.0
> JDK: 1.8.0_92
> Mahout: 0.12.2
> OS: Linux
>Reporter: Raviteja Lokineni
> Fix For: 0.13.0
>
>
> Command: {noformat}bin/mahout lucene.vector --dir 
> ~/softwares/solr-6.1.0/server/solr/nlp-core/data/index --output 
> /tmp/solr-nlp-core/out.vec --field rspns_val --dictOut 
> /tmp/solr-nlp-core/dictionary.txt --norm 2{noformat}
> Stacktrace:
> {noformat}
> hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running 
> locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-examples-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-mr-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/lib/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource: 
> ChecksumIndexInput(MMapIndexInput(path="/home/lok268/softwares/solr-6.1.0/server/solr/nlp-core/data/index/segments_2"))):
>  6 (needs to be between 0 and 1)
> at 
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:148)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:329)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
> at 
> org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:89)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:277)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1876) Mahout fails to read from lucene index of solr-5.5.2

2016-08-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1876:
--
Summary: Mahout fails to read from lucene index of solr-5.5.2  (was: Mahout 
fails to read from lucene index of solr-6.1.0)

> Mahout fails to read from lucene index of solr-5.5.2
> 
>
> Key: MAHOUT-1876
> URL: https://issues.apache.org/jira/browse/MAHOUT-1876
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
> Environment: Solr: 6.1.0
> JDK: 1.8.0_92
> Mahout: 0.12.2
> OS: Linux
>Reporter: Raviteja Lokineni
> Fix For: 0.13.0
>
>
> Command: {noformat}bin/mahout lucene.vector --dir 
> ~/softwares/solr-6.1.0/server/solr/nlp-core/data/index --output 
> /tmp/solr-nlp-core/out.vec --field rspns_val --dictOut 
> /tmp/solr-nlp-core/dictionary.txt --norm 2{noformat}
> Stacktrace:
> {noformat}
> hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running 
> locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-examples-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-mr-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/lib/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource: 
> ChecksumIndexInput(MMapIndexInput(path="/home/lok268/softwares/solr-6.1.0/server/solr/nlp-core/data/index/segments_2"))):
>  6 (needs to be between 0 and 1)
> at 
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:148)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:329)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
> at 
> org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:89)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:277)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1742) non-legacy framework related issues

2016-08-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1742.
---
Resolution: Implemented

> non-legacy framework related issues
> ---
>
> Key: MAHOUT-1742
> URL: https://issues.apache.org/jira/browse/MAHOUT-1742
> Project: Mahout
>  Issue Type: Epic
>Reporter: Dmitriy Lyubimov
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1877) Switch to Flink 1.1.0

2016-08-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1877.
---
Resolution: Implemented

> Switch to Flink 1.1.0
> -
>
> Key: MAHOUT-1877
> URL: https://issues.apache.org/jira/browse/MAHOUT-1877
> Project: Mahout
>  Issue Type: Improvement
>  Components: Flink
>Affects Versions: 0.12.2
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Switch to Flink 1.1.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1877) Switch to Flink 1.1.0

2016-08-08 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1877:
-

 Summary: Switch to Flink 1.1.0
 Key: MAHOUT-1877
 URL: https://issues.apache.org/jira/browse/MAHOUT-1877
 Project: Mahout
  Issue Type: Improvement
  Components: Flink
Affects Versions: 0.12.2
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 0.13.0


Switch to Flink 1.1.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1876) Mahout fails to read from lucene index of solr-6.1.0

2016-07-19 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384581#comment-15384581
 ] 

Suneel Marthi commented on MAHOUT-1876:
---

To add more context to the previous post, we tried moving to Lucene 4.10.x back 
in March 2015 but that completely broke the vectorization code in the legacy 
mapreduce and was failing all tests. 

If u r willing to take a stab at this and upgrade to Lucene 6.1.x, please reach 
out on dev@ and we can talk there.

> Mahout fails to read from lucene index of solr-6.1.0
> 
>
> Key: MAHOUT-1876
> URL: https://issues.apache.org/jira/browse/MAHOUT-1876
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
> Environment: Solr: 6.1.0
> JDK: 1.8.0_92
> Mahout: 0.12.2
> OS: Linux
>Reporter: Raviteja Lokineni
>
> Command: {noformat}bin/mahout lucene.vector --dir 
> ~/softwares/solr-6.1.0/server/solr/nlp-core/data/index --output 
> /tmp/solr-nlp-core/out.vec --field rspns_val --dictOut 
> /tmp/solr-nlp-core/dictionary.txt --norm 2{noformat}
> Stacktrace:
> {noformat}
> hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running 
> locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-examples-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-mr-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/lib/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource: 
> ChecksumIndexInput(MMapIndexInput(path="/home/lok268/softwares/solr-6.1.0/server/solr/nlp-core/data/index/segments_2"))):
>  6 (needs to be between 0 and 1)
> at 
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:148)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:329)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
> at 
> org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:89)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:277)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1876) Mahout fails to read from lucene index of solr-6.1.0

2016-07-19 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384544#comment-15384544
 ] 

Suneel Marthi commented on MAHOUT-1876:
---

Yes this is not supported. Mahout is still at Lucene 4.6.x and hence would not 
be compatible with Solr 6.0.

> Mahout fails to read from lucene index of solr-6.1.0
> 
>
> Key: MAHOUT-1876
> URL: https://issues.apache.org/jira/browse/MAHOUT-1876
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
> Environment: Solr: 6.1.0
> JDK: 1.8.0_92
> Mahout: 0.12.2
> OS: Linux
>Reporter: Raviteja Lokineni
>
> Command: {noformat}bin/mahout lucene.vector --dir 
> ~/softwares/solr-6.1.0/server/solr/nlp-core/data/index --output 
> /tmp/solr-nlp-core/out.vec --field rspns_val --dictOut 
> /tmp/solr-nlp-core/dictionary.txt --norm 2{noformat}
> Stacktrace:
> {noformat}
> hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running 
> locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-examples-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/mahout-mr-0.12.2-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/lok268/softwares/apache-mahout-distribution-0.12.2/lib/slf4j-log4j12-1.7.19.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource: 
> ChecksumIndexInput(MMapIndexInput(path="/home/lok268/softwares/solr-6.1.0/server/solr/nlp-core/data/index/segments_2"))):
>  6 (needs to be between 0 and 1)
> at 
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:148)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:329)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
> at 
> org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:89)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:277)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1870) Add import and export capabilities for DRMs to and from Apache Arrow

2016-06-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1870:
-

Assignee: Suneel Marthi

> Add import and export capabilities for DRMs to and from Apache Arrow
> 
>
> Key: MAHOUT-1870
> URL: https://issues.apache.org/jira/browse/MAHOUT-1870
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> We need to add the capability to import DRMs from and export DRMs to Apache 
> Arrow.   This will be part of a greater effort to make integration more 
> seamless with other projects. In some cases (eg. exporting to csv or tsv we 
> will allow for a a loss in precision).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1867) upgrade 3rd party jars prior to next release

2016-05-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1867.
---
Resolution: Fixed

> upgrade 3rd party jars prior to next release
> 
>
> Key: MAHOUT-1867
> URL: https://issues.apache.org/jira/browse/MAHOUT-1867
> Project: Mahout
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.12.1
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Trivial
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1867) upgrade 3rd party jars prior to next release

2016-05-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1867 started by Suneel Marthi.
-
> upgrade 3rd party jars prior to next release
> 
>
> Key: MAHOUT-1867
> URL: https://issues.apache.org/jira/browse/MAHOUT-1867
> Project: Mahout
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.12.1
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Trivial
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1867) upgrade 3rd party jars prior to next release

2016-05-29 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1867:
-

 Summary: upgrade 3rd party jars prior to next release
 Key: MAHOUT-1867
 URL: https://issues.apache.org/jira/browse/MAHOUT-1867
 Project: Mahout
  Issue Type: Improvement
  Components: build
Affects Versions: 0.12.1
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Trivial
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1698) Streaming K-means and Fuzzy K-means to output clusteredPoints

2016-05-29 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306002#comment-15306002
 ] 

Suneel Marthi commented on MAHOUT-1698:
---

Resolving this as 'Won't Fix', since this is legacy MapReduce code. If u have a 
patch please feel free to create a new jira and submit a PR.

> Streaming K-means and Fuzzy K-means to output clusteredPoints
> -
>
> Key: MAHOUT-1698
> URL: https://issues.apache.org/jira/browse/MAHOUT-1698
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.10.0
>Reporter: Sujit Thumma
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Similar to K-Means algorithm is there a way streaming K-means and Fuzzy 
> K-means output clustered points in map-reduce? This can be useful to map 
> document with cluster ID.  As of now only K-means can output clustered points 
> and streaming k-means just outputs centroids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1698) Streaming K-means and Fuzzy K-means to output clusteredPoints

2016-05-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1698.
---
   Resolution: Won't Fix
Fix Version/s: 0.13.0

> Streaming K-means and Fuzzy K-means to output clusteredPoints
> -
>
> Key: MAHOUT-1698
> URL: https://issues.apache.org/jira/browse/MAHOUT-1698
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Affects Versions: 0.10.0
>Reporter: Sujit Thumma
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Similar to K-Means algorithm is there a way streaming K-means and Fuzzy 
> K-means output clustered points in map-reduce? This can be useful to map 
> document with cluster ID.  As of now only K-means can output clustered points 
> and streaming k-means just outputs centroids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1866) Add matrix-to-tsv string function

2016-05-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1866.
---
Resolution: Implemented

> Add matrix-to-tsv string function
> -
>
> Key: MAHOUT-1866
> URL: https://issues.apache.org/jira/browse/MAHOUT-1866
> Project: Mahout
>  Issue Type: Sub-task
>  Components: visiualization
>Affects Versions: 0.12.1
>Reporter: Trevor Grant
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Need a function to convert a matrix to a tsv string which can then be plotted 
> by
> - Zeppelin %table visualization packages
> - Passed to R / Python via Zeppelin Resource Manager
> It has been noted that a matrix can be registered as an RDD and passed across 
> contexts directly in Spark, however this breaks the 'backend agnoistic' 
> philosophy.  Until H20 and Flink also both support Python / R environments it 
> is more reasonable to use tab-seperated-value strings.
> Further, matrices might be extremely large and unfit for being directly 
> converted to tsvs.  It may be wise to introduce some sort of safety valve for 
> preventing excessively large matrices from being materialized into local 
> memory (eg. supposing the user hasn't called their own sampling method on a 
> matrix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1866) Add matrix-to-tsv string function

2016-05-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1866:
--
Affects Version/s: 0.12.1
  Component/s: visiualization

> Add matrix-to-tsv string function
> -
>
> Key: MAHOUT-1866
> URL: https://issues.apache.org/jira/browse/MAHOUT-1866
> Project: Mahout
>  Issue Type: Sub-task
>  Components: visiualization
>Affects Versions: 0.12.1
>Reporter: Trevor Grant
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Need a function to convert a matrix to a tsv string which can then be plotted 
> by
> - Zeppelin %table visualization packages
> - Passed to R / Python via Zeppelin Resource Manager
> It has been noted that a matrix can be registered as an RDD and passed across 
> contexts directly in Spark, however this breaks the 'backend agnoistic' 
> philosophy.  Until H20 and Flink also both support Python / R environments it 
> is more reasonable to use tab-seperated-value strings.
> Further, matrices might be extremely large and unfit for being directly 
> converted to tsvs.  It may be wise to introduce some sort of safety valve for 
> preventing excessively large matrices from being materialized into local 
> memory (eg. supposing the user hasn't called their own sampling method on a 
> matrix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1866) Add matrix-to-tsv string function

2016-05-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1866 started by Suneel Marthi.
-
> Add matrix-to-tsv string function
> -
>
> Key: MAHOUT-1866
> URL: https://issues.apache.org/jira/browse/MAHOUT-1866
> Project: Mahout
>  Issue Type: Sub-task
>Reporter: Trevor Grant
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Need a function to convert a matrix to a tsv string which can then be plotted 
> by
> - Zeppelin %table visualization packages
> - Passed to R / Python via Zeppelin Resource Manager
> It has been noted that a matrix can be registered as an RDD and passed across 
> contexts directly in Spark, however this breaks the 'backend agnoistic' 
> philosophy.  Until H20 and Flink also both support Python / R environments it 
> is more reasonable to use tab-seperated-value strings.
> Further, matrices might be extremely large and unfit for being directly 
> converted to tsvs.  It may be wise to introduce some sort of safety valve for 
> preventing excessively large matrices from being materialized into local 
> memory (eg. supposing the user hasn't called their own sampling method on a 
> matrix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1866) Add matrix-to-tsv string function

2016-05-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1866:
-

Assignee: Suneel Marthi

> Add matrix-to-tsv string function
> -
>
> Key: MAHOUT-1866
> URL: https://issues.apache.org/jira/browse/MAHOUT-1866
> Project: Mahout
>  Issue Type: Sub-task
>Reporter: Trevor Grant
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Need a function to convert a matrix to a tsv string which can then be plotted 
> by
> - Zeppelin %table visualization packages
> - Passed to R / Python via Zeppelin Resource Manager
> It has been noted that a matrix can be registered as an RDD and passed across 
> contexts directly in Spark, however this breaks the 'backend agnoistic' 
> philosophy.  Until H20 and Flink also both support Python / R environments it 
> is more reasonable to use tab-seperated-value strings.
> Further, matrices might be extremely large and unfit for being directly 
> converted to tsvs.  It may be wise to introduce some sort of safety valve for 
> preventing excessively large matrices from being materialized into local 
> memory (eg. supposing the user hasn't called their own sampling method on a 
> matrix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1854) Zeppelin integration: Spark Intrepreter

2016-05-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305744#comment-15305744
 ] 

Suneel Marthi commented on MAHOUT-1854:
---

[~rawkintrevo] Please feel free to start the discussions around this. All of us 
have been busy earning our paychecks and didn't have a chance to brainstorm 
this.

Since this is up ur alley, why don't u start the discussions on this. We could 
comment on this jira.

> Zeppelin integration: Spark Intrepreter
> ---
>
> Key: MAHOUT-1854
> URL: https://issues.apache.org/jira/browse/MAHOUT-1854
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.12.0
>Reporter: Andrew Palumbo
>Assignee: Pat Ferrel
> Fix For: 0.13.0
>
>
> Integrate Mahout with Zeppelin by creating a Zeppelin Interpreter for Mahout 
> first.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1865) Remove Hadoop 1 support.

2016-05-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305457#comment-15305457
 ] 

Suneel Marthi commented on MAHOUT-1865:
---

who's got the admin permissions for Jenkins? [~sslavic] ?

> Remove Hadoop 1 support.
> 
>
> Key: MAHOUT-1865
> URL: https://issues.apache.org/jira/browse/MAHOUT-1865
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
> Fix For: 0.13.0
>
>
> Remove support for Hadoop 1.
> 1. Disable Jenkins Hadoop 1 build.
> 2. Remove Hadoop 1 profile from the root {{pom.xml}}.
> 3. Refactor any Hadoop1 specific code *outside of* the {{mr}} module.
> 4. Update documentation.
> 5. Notify user@.
> Hadoop1HDFSUtils is likely the only code that this will affect.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1860) Add Stack Image to the top of the front page of the Website

2016-05-21 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295363#comment-15295363
 ] 

Suneel Marthi commented on MAHOUT-1860:
---

Make sure u have a copyright and add Watermarks to make it impossible to 
copy-paste.

> Add Stack Image to the top of the front page of the  Website
> 
>
> Key: MAHOUT-1860
> URL: https://issues.apache.org/jira/browse/MAHOUT-1860
> Project: Mahout
>  Issue Type: Documentation
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.13.0
>
> Attachments: mahout_stack.svg
>
>
> Add a variant of stack.svg - the image of the mahout stack (pg. 64 in the 
> book) "Above the fold" on the site.  This image seems to help people grasp 
> "what mahout is" very quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1799) Read null row vectors from file in TextDelimeterReaderWriter driver

2016-05-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1799:
--
Fix Version/s: (was: 1.0.0)
   0.13.0

> Read null row vectors from file in TextDelimeterReaderWriter driver
> ---
>
> Key: MAHOUT-1799
> URL: https://issues.apache.org/jira/browse/MAHOUT-1799
> Project: Mahout
>  Issue Type: Improvement
>  Components: spark
>Reporter: Jussi Jousimo
>Assignee: Pat Ferrel
>Priority: Minor
> Fix For: 0.13.0
>
>
> Since some row vectors in a sparse matrix can be null, Mahout writes them out 
> to a file with the row label only. However, Mahout cannot read these files, 
> but throws an exception when it encounters a label-only row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1682) Create a documentation page for SPCA

2016-05-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1682:
--
Fix Version/s: (was: 0.12.1)
   0.13.0

> Create a documentation page for SPCA
> 
>
> Key: MAHOUT-1682
> URL: https://issues.apache.org/jira/browse/MAHOUT-1682
> Project: Mahout
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Andrew Palumbo
>Assignee: Andrew Musselman
> Fix For: 0.13.0
>
>
> Following the template of the SSVD and QR pages create a page for SPCA.  This 
> Page would go under Algorithms-> Distributed Matrix Decomposition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1686) Create a documentattion page for ALS

2016-05-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1686:
--
Fix Version/s: (was: 0.12.1)
   0.13.0

> Create a documentattion page for ALS
> 
>
> Key: MAHOUT-1686
> URL: https://issues.apache.org/jira/browse/MAHOUT-1686
> Project: Mahout
>  Issue Type: Documentation
>Affects Versions: 0.11.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Musselman
> Fix For: 0.13.0
>
>
> Following the template of the SSVD and QR pages create a page for ALS. This 
> Page would go under Algorithms-> Distributed Matrix Decomposition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1830) Publish scaladocs for Mahout 0.12.0 release

2016-05-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1830:
--
Fix Version/s: (was: 0.12.1)
   0.13.0

> Publish scaladocs for Mahout 0.12.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.13.0
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1742) non-legacy framework related issues

2016-05-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1742:
--
Fix Version/s: (was: 0.12.1)
   0.13.0

> non-legacy framework related issues
> ---
>
> Key: MAHOUT-1742
> URL: https://issues.apache.org/jira/browse/MAHOUT-1742
> Project: Mahout
>  Issue Type: Epic
>Reporter: Dmitriy Lyubimov
>Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1799) Read null row vectors from file in TextDelimeterReaderWriter driver

2016-05-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1799.
---
Resolution: Fixed

> Read null row vectors from file in TextDelimeterReaderWriter driver
> ---
>
> Key: MAHOUT-1799
> URL: https://issues.apache.org/jira/browse/MAHOUT-1799
> Project: Mahout
>  Issue Type: Improvement
>  Components: spark
>Reporter: Jussi Jousimo
>Assignee: Pat Ferrel
>Priority: Minor
> Fix For: 1.0.0
>
>
> Since some row vectors in a sparse matrix can be null, Mahout writes them out 
> to a file with the row label only. However, Mahout cannot read these files, 
> but throws an exception when it encounters a label-only row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1830) Publish scaladocs for Mahout 0.12.0 release

2016-05-15 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1830 started by Suneel Marthi.
-
> Publish scaladocs for Mahout 0.12.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.12.1
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1848) drmSampleKRows in FlinkEngine should generate a dense or sparse matrix

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1848.
---
Resolution: Fixed

> drmSampleKRows in FlinkEngine should generate a dense or sparse matrix
> --
>
> Key: MAHOUT-1848
> URL: https://issues.apache.org/jira/browse/MAHOUT-1848
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> drmSampleKRows in FlinkEngine should generate a dense or sparse matrix based 
> on the type of vector in the sampled Dataset



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1848) drmSampleKRows in FlinkEngine should generate a dense or sparse matrix

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1848 started by Suneel Marthi.
-
> drmSampleKRows in FlinkEngine should generate a dense or sparse matrix
> --
>
> Key: MAHOUT-1848
> URL: https://issues.apache.org/jira/browse/MAHOUT-1848
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> drmSampleKRows in FlinkEngine should generate a dense or sparse matrix based 
> on the type of vector in the sampled Dataset



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1848) drmSampleKRows in FlinkEngine should generate a dense or sparse matrix

2016-05-02 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1848:
-

 Summary: drmSampleKRows in FlinkEngine should generate a dense or 
sparse matrix
 Key: MAHOUT-1848
 URL: https://issues.apache.org/jira/browse/MAHOUT-1848
 Project: Mahout
  Issue Type: Bug
  Components: Flink
Affects Versions: 0.12.0
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 0.12.1


drmSampleKRows in FlinkEngine should generate a dense or sparse matrix based on 
the type of vector in the sampled Dataset



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MAHOUT-1830) Publish scaladocs for Mahout 0.12.0 release

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1830:
--
Comment: was deleted

(was: [~sslavic]   ?? This is on the critical path.)

> Publish scaladocs for Mahout 0.12.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.12.1
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1830) Publish scaladocs for Mahout 0.12.0 release

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1830:
-

Assignee: Suneel Marthi  (was: Stevo Slavic)

> Publish scaladocs for Mahout 0.12.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Critical
> Fix For: 0.12.1
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1847) drmSampleRows in FlinkEngine doesn't wrap Int Keys when ClassTag is of type Int

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1847.
---
Resolution: Fixed

> drmSampleRows in FlinkEngine doesn't wrap Int Keys when ClassTag is of type 
> Int
> ---
>
> Key: MAHOUT-1847
> URL: https://issues.apache.org/jira/browse/MAHOUT-1847
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> drmSampleKRows in Flinkengine doesn't rekey with Integer keys when wrapping 
> the resulting DataSet into a DRM for a classTag of type Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1847) drmSampleRows in FlinkEngine doesn't wrap Int Keys when ClassTag is of type Int

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1847:
--
Description: drmSampleKRows in Flinkengine doesn't rekey with Integer keys 
when wrapping the resulting DataSet into a DRM for a classTag of type Int.  
(was: drmSampleKRows in Flinkengine doesn't rekey with Integer keys when 
wrapping the resulting DataSet into a DRM)

> drmSampleRows in FlinkEngine doesn't wrap Int Keys when ClassTag is of type 
> Int
> ---
>
> Key: MAHOUT-1847
> URL: https://issues.apache.org/jira/browse/MAHOUT-1847
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> drmSampleKRows in Flinkengine doesn't rekey with Integer keys when wrapping 
> the resulting DataSet into a DRM for a classTag of type Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1847) drmSampleRows in FlinkEngine doesn't wrap Int Keys when ClassTag is of type Int

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1847:
--
Summary: drmSampleRows in FlinkEngine doesn't wrap Int Keys when ClassTag 
is of type Int  (was: drmSampleRows in FlinkEngine)

> drmSampleRows in FlinkEngine doesn't wrap Int Keys when ClassTag is of type 
> Int
> ---
>
> Key: MAHOUT-1847
> URL: https://issues.apache.org/jira/browse/MAHOUT-1847
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> drmSampleKRows in Flinkengine doesn't rekey with Integer keys when wrapping 
> the resulting DataSet into a DRM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1847) drmSampleRows in FlinkEngine

2016-05-02 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1847:
-

 Summary: drmSampleRows in FlinkEngine
 Key: MAHOUT-1847
 URL: https://issues.apache.org/jira/browse/MAHOUT-1847
 Project: Mahout
  Issue Type: Bug
  Components: Flink
Affects Versions: 0.12.0
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 0.12.1


drmSampleKRows in Flinkengine doesn't rekey with Integer keys when wrapping the 
resulting DataSet into a DRM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (MAHOUT-1847) drmSampleRows in FlinkEngine

2016-05-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1847 started by Suneel Marthi.
-
> drmSampleRows in FlinkEngine
> 
>
> Key: MAHOUT-1847
> URL: https://issues.apache.org/jira/browse/MAHOUT-1847
> Project: Mahout
>  Issue Type: Bug
>  Components: Flink
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> drmSampleKRows in Flinkengine doesn't rekey with Integer keys when wrapping 
> the resulting DataSet into a DRM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAHOUT-1841) Matrices.symmetricUniformView(...) returning values in the wrong range.

2016-04-30 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1841.
---
Resolution: Fixed

> Matrices.symmetricUniformView(...) returning values in the wrong range.
> ---
>
> Key: MAHOUT-1841
> URL: https://issues.apache.org/jira/browse/MAHOUT-1841
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.12.0
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> Per javadocs, {{Matrices.symmetricUniformView(...)}} is meant to return 
> values in on the range of [-1,1): 
> {code}
> /**
>* Matrix view based on uniform [-1,1) distribution.
>*
>* @param seed generator seed
>*/
>   public static final Matrix symmetricUniformView(final int rows,
>   final int columns,
>   int seed) {
> return functionalMatrixView(rows, columns, 
> uniformSymmetricGenerator(seed), true);
>   }
> {code}
> Ranges being returned now are on (-.5,.5).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAHOUT-1841) Matrices.symmetricUniformView(...) returning values in the wrong range.

2016-04-29 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1841:
-

Assignee: Suneel Marthi

> Matrices.symmetricUniformView(...) returning values in the wrong range.
> ---
>
> Key: MAHOUT-1841
> URL: https://issues.apache.org/jira/browse/MAHOUT-1841
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.12.0
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
> Fix For: 0.12.1
>
>
> Per javadocs, {{Matrices.symmetricUniformView(...)}} is meant to return 
> values in on the range of [-1,1): 
> {code}
> /**
>* Matrix view based on uniform [-1,1) distribution.
>*
>* @param seed generator seed
>*/
>   public static final Matrix symmetricUniformView(final int rows,
>   final int columns,
>   int seed) {
> return functionalMatrixView(rows, columns, 
> uniformSymmetricGenerator(seed), true);
>   }
> {code}
> Ranges being returned now are on (-.5,.5).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1837) Sparse/Dense Matrix analysis for Matrix Multiplication

2016-04-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262703#comment-15262703
 ] 

Suneel Marthi commented on MAHOUT-1837:
---

Wouldn't it be easier to separate the 2 out into respective jiras ?

> Sparse/Dense Matrix analysis for Matrix Multiplication
> --
>
> Key: MAHOUT-1837
> URL: https://issues.apache.org/jira/browse/MAHOUT-1837
> Project: Mahout
>  Issue Type: Improvement
>  Components: Math
>Affects Versions: 0.12.0
>Reporter: Andrew Palumbo
> Fix For: 0.12.1
>
>
> In matrix multiplication, Sparse Matrices can easily turn dense and bloat 
> memory,  one fully dense column and one fully dense row can cause a sparse 
> %*% sparse operation have a dense result.  
> There are two issues here one with a quick Fix and one a bit more involved:
>#  in {{ABt.Scala}} use check the `MatrixFlavor` of the combiner and use 
> the flavor of the Block as the resulting Sparse or Dense matrix type:
> {code}
> val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
>   new SparseMatrix(prodNCol, block.nrow).t
> } else {
>   new DenseMatrix(prodNCol, block.nrow).t
> }
> {code}
>  a simlar check needs to be made in the {{blockify}} transformation.
>  
>#  More importantly, and more involved is to do an actual analysis of the 
> resulting matrix data in the in-core {{mmul}} class and use a matrix of the 
> appropriate Structure as a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MAHOUT-1705) Verify dependencies in job jar for mahout-examples

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed MAHOUT-1705.
-

> Verify dependencies in job jar for mahout-examples
> --
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Musselman
> Fix For: 0.12.0
>
>
> mahout-example-*-job.jar is around ~56M, and may package unused runtime 
> libraries.  We need to go through this and make sure that there is nothing 
> unneeded or redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1705) Verify dependencies in job jar for mahout-examples

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1705:
--
Fix Version/s: 0.12.0

> Verify dependencies in job jar for mahout-examples
> --
>
> Key: MAHOUT-1705
> URL: https://issues.apache.org/jira/browse/MAHOUT-1705
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Musselman
> Fix For: 0.12.0
>
>
> mahout-example-*-job.jar is around ~56M, and may package unused runtime 
> libraries.  We need to go through this and make sure that there is nothing 
> unneeded or redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MAHOUT-1740) Layout on algorithms page broken

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed MAHOUT-1740.
-

> Layout on algorithms page broken
> 
>
> Key: MAHOUT-1740
> URL: https://issues.apache.org/jira/browse/MAHOUT-1740
> Project: Mahout
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
> Fix For: 0.12.0
>
>
> http://mahout.apache.org/users/basics/algorithms.html
> On Chrome on Linux the main body content is bleeding into the right nav. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MAHOUT-1811) Fix calculation of second norm of DRM in Flink

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed MAHOUT-1811.
-

> Fix calculation of second norm of DRM in Flink
> --
>
> Key: MAHOUT-1811
> URL: https://issues.apache.org/jira/browse/MAHOUT-1811
> Project: Mahout
>  Issue Type: Bug
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1764) Mahout DSL for Flink: Add standard backend tests for Flink

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1764:
--
Fix Version/s: 0.12.0

> Mahout DSL for Flink: Add standard backend tests for Flink
> --
>
> Key: MAHOUT-1764
> URL: https://issues.apache.org/jira/browse/MAHOUT-1764
> Project: Mahout
>  Issue Type: Task
>  Components: Math
>Reporter: Alexey Grigorev
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 0.12.0
>
>
> From github comment by Dmitriy:
> also on the topic of test suite coverage: we need to pass our standard tests. 
> The base clases for them are:
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuiteBase.scala
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeOpsSuiteBase.scala
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeSuiteBase.scala
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/RLikeDrmOpsSuiteBase.scala
> The technique here is to take these test cases as a base class for a 
> distributed test case (you may want to see how it was done for Spark and 
> H2O). This is our basic assertion that our main algorithms are passing on a 
> toy problem for a given backend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MAHOUT-1764) Mahout DSL for Flink: Add standard backend tests for Flink

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed MAHOUT-1764.
-

> Mahout DSL for Flink: Add standard backend tests for Flink
> --
>
> Key: MAHOUT-1764
> URL: https://issues.apache.org/jira/browse/MAHOUT-1764
> Project: Mahout
>  Issue Type: Task
>  Components: Math
>Reporter: Alexey Grigorev
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 0.12.0
>
>
> From github comment by Dmitriy:
> also on the topic of test suite coverage: we need to pass our standard tests. 
> The base clases for them are:
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/decompositions/DistributedDecompositionsSuiteBase.scala
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeOpsSuiteBase.scala
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/DrmLikeSuiteBase.scala
> https://github.com/apache/mahout/blob/master/math-scala/src/test/scala/org/apache/mahout/math/drm/RLikeDrmOpsSuiteBase.scala
> The technique here is to take these test cases as a base class for a 
> distributed test case (you may want to see how it was done for Spark and 
> H2O). This is our basic assertion that our main algorithms are passing on a 
> toy problem for a given backend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1811) Fix calculation of second norm of DRM in Flink

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1811:
--
Fix Version/s: 0.12.0

> Fix calculation of second norm of DRM in Flink
> --
>
> Key: MAHOUT-1811
> URL: https://issues.apache.org/jira/browse/MAHOUT-1811
> Project: Mahout
>  Issue Type: Bug
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1740) Layout on algorithms page broken

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1740:
--
Fix Version/s: 0.12.0

> Layout on algorithms page broken
> 
>
> Key: MAHOUT-1740
> URL: https://issues.apache.org/jira/browse/MAHOUT-1740
> Project: Mahout
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
> Fix For: 0.12.0
>
>
> http://mahout.apache.org/users/basics/algorithms.html
> On Chrome on Linux the main body content is bleeding into the right nav. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAHOUT-1777) move HDFSUtil classes into the HDFS module

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1777:
--
Fix Version/s: 0.12.0

> move HDFSUtil classes into the HDFS module
> --
>
> Key: MAHOUT-1777
> URL: https://issues.apache.org/jira/browse/MAHOUT-1777
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Andrew Palumbo
> Fix For: 0.12.0
>
>
> The HDFSUtil classes are used by spark, h2o and flink and implemented in each 
> module.  Move them to the common HDFS module.  The spark implementation 
> includes a  {{delete(path: String)}} method used by the Spark Naive Bayes CLI 
> otherwise the others are nearly identical.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MAHOUT-1777) move HDFSUtil classes into the HDFS module

2016-04-27 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed MAHOUT-1777.
-

> move HDFSUtil classes into the HDFS module
> --
>
> Key: MAHOUT-1777
> URL: https://issues.apache.org/jira/browse/MAHOUT-1777
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Andrew Palumbo
>
> The HDFSUtil classes are used by spark, h2o and flink and implemented in each 
> module.  Move them to the common HDFS module.  The spark implementation 
> includes a  {{delete(path: String)}} method used by the Spark Naive Bayes CLI 
> otherwise the others are nearly identical.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >