[jira] [Updated] (MAHOUT-1988) ViennaCL and OMP not building for Scala 2.11

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1988:
-
Summary: ViennaCL and OMP not building for Scala 2.11  (was:  scala 2.10 is 
hardcoded somewhere)

> ViennaCL and OMP not building for Scala 2.11
> 
>
> Key: MAHOUT-1988
> URL: https://issues.apache.org/jira/browse/MAHOUT-1988
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Andrew Palumbo
>Assignee: Trevor Grant
>Priority: Blocker
> Fix For: 0.13.1
>
>
> After building mahout against scala 2.11: 
> {code}
> mvn clean install -Dscala.version=2.11.4 -Dscala.compat.version=2.11 
> -Phadoop2  -DskipTests
> {code}
> ViennaCL jars are built hard-coded to scala 2.10.  This is currently blocking 
> the 0.13.1 release. 
> {code}
> mahout-h2o_2.11-0.13.1-SNAPSHOT.jar
> mahout-hdfs-0.13.1-SNAPSHOT.jar
> mahout-math-0.13.1-SNAPSHOT.jar
> mahout-math-scala_2.11-0.13.1-SNAPSHOT.jar
> mahout-mr-0.13.1-SNAPSHOT.jar
> mahout-native-cuda_2.10-0.13.0-SNAPSHOT.jar
> mahout-native-cuda_2.10-0.13.1-SNAPSHOT.jar
> mahout-native-viennacl_2.10-0.13.1-SNAPSHOT.jar
> mahout-native-viennacl-omp_2.10-0.13.1-SNAPSHOT.jar
> mahout-spark_2.11-0.13.1-SNAPSHOT-dependency-reduced.jar
> mahout-spark_2.11-0.13.1-SNAPSHOT.jar
> {code} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1965) Update CI to cover multiple spark/scala builds

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1965:
-
Fix Version/s: 0.13.1

> Update CI to cover multiple spark/scala builds
> --
>
> Key: MAHOUT-1965
> URL: https://issues.apache.org/jira/browse/MAHOUT-1965
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
>Assignee: Trevor Grant
> Fix For: 0.13.1
>
>
> We are targeting to release for Spark 1.6/2.0/2.1 in 0.13.1 and Scala 
> 2.10/2.11/2.11 respectively.
> Need to update TravisCI to cover all of these.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1967) Make Flink Profile

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1967:
-
Fix Version/s: 0.13.1

> Make Flink Profile
> --
>
> Key: MAHOUT-1967
> URL: https://issues.apache.org/jira/browse/MAHOUT-1967
> Project: Mahout
>  Issue Type: Improvement
>  Components: Flink
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
>Assignee: Aditya AS
>  Labels: beginner
> Fix For: 0.13.1
>
>
> Currently Flink module is just commented out of the pom. This is tacky. 
> Flink should have profile which by default is disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1970) Utilize Spark Pseudoclusters in TravisCi

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1970:
-
Fix Version/s: 0.13.1

> Utilize Spark Pseudoclusters in TravisCi
> 
>
> Key: MAHOUT-1970
> URL: https://issues.apache.org/jira/browse/MAHOUT-1970
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
>Assignee: Trevor Grant
> Fix For: 0.13.1
>
>
> In RCs we always need to test everything in Spark pseudoclusters, as bugs 
> occasionally work there way in here (jars not shipping, etc.- things don't 
> always quite the same in a pseudocluster as they do in local[*])
> As we profliferate our supported Spark versions, we need to automate this 
> with TravisCI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1976) Add Canopy Clustering Algorithm

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1976:
-
Fix Version/s: 0.13.1

> Add Canopy Clustering Algorithm
> ---
>
> Key: MAHOUT-1976
> URL: https://issues.apache.org/jira/browse/MAHOUT-1976
> Project: Mahout
>  Issue Type: Improvement
>  Components: Algorithms
>Affects Versions: 0.13.2
>Reporter: Trevor Grant
>Assignee: Trevor Grant
> Fix For: 0.13.1
>
>
> Primarily, we need to lay out the clustering section of the Algorihtms 
> Framework.
> The Canopy Clustering Algorithm is very simple and yet very useful as a 
> preprocessing step for more advanced clustering algorithms such as KMeans and 
> Hierarchical Clustering. 
> https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
> The majority of the "work" on this PR will be creating the framework. 
> It is also one of the Legacy MR algorithms that would be nice to port.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1973) When building profiles conditionally (say Flink, Viennacl) a hadoop.version related error occurs. Need to check if conditional building of other modules also has this er

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1973:
-
Fix Version/s: (was: 0.13.2)
   0.13.1

> When building profiles conditionally (say Flink, Viennacl) a hadoop.version 
> related error occurs. Need to check if conditional building of other modules 
> also has this error and fix the issue.
> ---
>
> Key: MAHOUT-1973
> URL: https://issues.apache.org/jira/browse/MAHOUT-1973
> Project: Mahout
>  Issue Type: Bug
>  Components: build
>Reporter: Aditya AS
>Assignee: Aditya AS
>Priority: Minor
> Fix For: 0.13.1
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1795) Release Scala 2.11 bindings

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1795:
-
Fix Version/s: (was: 0.13.2)
   0.13.1

> Release Scala 2.11 bindings
> ---
>
> Key: MAHOUT-1795
> URL: https://issues.apache.org/jira/browse/MAHOUT-1795
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Mike Kaplinskiy
> Fix For: 0.13.1
>
> Attachments: patch.diff
>
>
> It would be nice to ship scala 2.11 bindings for mahout-math/mahout-spark. 
> (I'm not sure of other users, but mahout-shell isn't nearly at the top of my 
> list here).
> It looks simple enough for those two - the attached patch is a 
> proof-of-concept to compile (and pass all tests) under scala 2.11. I'm not 
> sure what the proper way to do this is, but it doesn't look too daunting. 
> (Famous last words?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1969) Create Profiles for Spark 1.6, 2.0.2, 2.1.0

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1969:
-
Fix Version/s: (was: 0.13.2)
   0.13.1

> Create Profiles for Spark 1.6, 2.0.2, 2.1.0
> ---
>
> Key: MAHOUT-1969
> URL: https://issues.apache.org/jira/browse/MAHOUT-1969
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
>Assignee: Aditya AS
>  Labels: beginner
> Fix For: 0.13.1
>
>
> Create profiles for spark 1.6, 2.0.2 and 2.1.0.  Spark 1.6 should be default. 
> Update CI tests to use profiles instead of variable setting.
> Further- Spark 1.6 should invoke scala 2.10 profile by default, Spark 2.x 
> should invoke scala 2.11.
> As such Mahout-1968 is a blocker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1969) Create Profiles for Spark 1.6, 2.0.2, 2.1.0

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1969:
-
Fix Version/s: 0.13.2

> Create Profiles for Spark 1.6, 2.0.2, 2.1.0
> ---
>
> Key: MAHOUT-1969
> URL: https://issues.apache.org/jira/browse/MAHOUT-1969
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
>Assignee: Aditya AS
>  Labels: beginner
> Fix For: 0.13.2
>
>
> Create profiles for spark 1.6, 2.0.2 and 2.1.0.  Spark 1.6 should be default. 
> Update CI tests to use profiles instead of variable setting.
> Further- Spark 1.6 should invoke scala 2.10 profile by default, Spark 2.x 
> should invoke scala 2.11.
> As such Mahout-1968 is a blocker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1973) When building profiles conditionally (say Flink, Viennacl) a hadoop.version related error occurs. Need to check if conditional building of other modules also has this er

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1973:
-
Fix Version/s: 0.13.2

> When building profiles conditionally (say Flink, Viennacl) a hadoop.version 
> related error occurs. Need to check if conditional building of other modules 
> also has this error and fix the issue.
> ---
>
> Key: MAHOUT-1973
> URL: https://issues.apache.org/jira/browse/MAHOUT-1973
> Project: Mahout
>  Issue Type: Bug
>  Components: build
>Reporter: Aditya AS
>Assignee: Aditya AS
>Priority: Minor
> Fix For: 0.13.2
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1795) Release Scala 2.11 bindings

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1795:
-
Fix Version/s: (was: 1.0.0)
   0.13.2

> Release Scala 2.11 bindings
> ---
>
> Key: MAHOUT-1795
> URL: https://issues.apache.org/jira/browse/MAHOUT-1795
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Mike Kaplinskiy
> Fix For: 0.13.2
>
> Attachments: patch.diff
>
>
> It would be nice to ship scala 2.11 bindings for mahout-math/mahout-spark. 
> (I'm not sure of other users, but mahout-shell isn't nearly at the top of my 
> list here).
> It looks simple enough for those two - the attached patch is a 
> proof-of-concept to compile (and pass all tests) under scala 2.11. I'm not 
> sure what the proper way to do this is, but it doesn't look too daunting. 
> (Famous last words?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1830) Publish scaladocs for Mahout 0.13.0 release

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1830:
-
Fix Version/s: (was: 0.13.0)
   0.13.1

> Publish scaladocs for Mahout 0.13.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Priority: Critical
>  Labels: Newbie
> Fix For: 0.13.1
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (MAHOUT-1830) Publish scaladocs for Mahout 0.13.0 release

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant resolved MAHOUT-1830.
--
Resolution: Fixed

> Publish scaladocs for Mahout 0.13.0 release
> ---
>
> Key: MAHOUT-1830
> URL: https://issues.apache.org/jira/browse/MAHOUT-1830
> Project: Mahout
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0
>Reporter: Suneel Marthi
>Priority: Critical
>  Labels: Newbie
> Fix For: 0.13.0
>
>
> Need to publish scaladocs for Mahout 0.12.0, present scaladocs out there are 
> from 0.10.2 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1972) Create Quickstart Writeup for Mahout 0.13.0 documentation page

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1972:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Create Quickstart Writeup for Mahout 0.13.0 documentation page
> --
>
> Key: MAHOUT-1972
> URL: https://issues.apache.org/jira/browse/MAHOUT-1972
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation, website
>Affects Versions: 0.13.1
>Reporter: Dustin VanStee
>Priority: Minor
>  Labels: beginner, documentation
> Fix For: 0.13.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> A new quickstart page needs to be constructed to help first time users 
> quickly do something using Mahout.   Ideas could be using the Mahout shell, 
> or maybe a mahout and your favorite notebook quick start.
> File to modify is website/_pages/docs/0.13.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1964) Logo in the spark-shell is broken

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1964:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Logo in the spark-shell is broken
> -
>
> Key: MAHOUT-1964
> URL: https://issues.apache.org/jira/browse/MAHOUT-1964
> Project: Mahout
>  Issue Type: Bug
>  Components: Mahout spark shell
>Affects Versions: 0.13.0
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
>  Labels: beginner
> Fix For: 0.13.2
>
>
> Mahout logo in the shell has a few characters misplaced.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1952) Allow pass-through of params for driver's CLI to spark-submit

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1952:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Allow pass-through of params for driver's CLI to spark-submit
> -
>
> Key: MAHOUT-1952
> URL: https://issues.apache.org/jira/browse/MAHOUT-1952
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification, CLI, Collaborative Filtering
>Affects Versions: 0.13.0
> Environment: CLI drivers launched from mahout script
>Reporter: Pat Ferrel
>Assignee: Pat Ferrel
>Priority: Minor
>  Labels: CLI
> Fix For: 0.13.2
>
>
> remove driver CLI args that are dups of what spark-submit can do and allow 
> passthrough of arbitrary extra CLI to spar-submit using spark-submit parsing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1790) SparkEngine nnz overflow resultSize when reducing.

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1790:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> SparkEngine nnz overflow resultSize when reducing.
> --
>
> Key: MAHOUT-1790
> URL: https://issues.apache.org/jira/browse/MAHOUT-1790
> Project: Mahout
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.11.1
>Reporter: Michel Lemay
>Assignee: Andrew Palumbo
>Priority: Minor
> Fix For: 0.13.2
>
>
> When counting numNonZeroElementsPerColumn in spark engine with large number 
> of columns, we get the following error:
> ERROR TaskSetManager: Total size of serialized results of nnn tasks (1031.7 
> MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
> and then, the call stack:
> org.apache.spark.SparkException: Job aborted due to stage failure: Total size 
> of serialized results of 267 tasks (1024.1 MB) is bigger than 
> spark.driver.maxResultSize (1024.0 MB)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1942)
> at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1003)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
> at org.apache.spark.rdd.RDD.reduce(RDD.scala:985)
> at 
> org.apache.mahout.sparkbindings.SparkEngine$.numNonZeroElementsPerColumn(SparkEngine.scala:86)
> at 
> org.apache.mahout.math.drm.CheckpointedOps.numNonZeroElementsPerColumn(CheckpointedOps.scala:37)
> at 
> org.apache.mahout.math.cf.SimilarityAnalysis$.sampleDownAndBinarize(SimilarityAnalysis.scala:286)
> at 
> org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrences(SimilarityAnalysis.scala:66)
> at 
> org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:141)
> This occurs because it uses a DenseVector and spark seemingly aggregate all 
> of them on the driver before reducing.  
> I think this could be easily prevented with a treeReduce(_ += _, depth)  
> instead of a reduce(_ += _)
> 'depth' could be computed in function of 'n' and numberOfPartitions.. 
> something in the line of:
>   val maxResultSize = 
>   val numPartitions = drm.rdd.partitions.size
>   val n = drm.ncol
>   val bytesPerVector = n * 8 + overhead?
>   val maxVectors = maxResultSize / bytes / 2 + 1 // be safe
>   val depth = math.max(1, math.ceil(math.log(1 + numPartitions / maxVectors) 
> / math.log(2)).toInt)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1884) Allow specification of dimensions of a DRM

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1884:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Allow specification of dimensions of a DRM
> --
>
> Key: MAHOUT-1884
> URL: https://issues.apache.org/jira/browse/MAHOUT-1884
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Sebastian Schelter
>Assignee: Andrew Palumbo
>Priority: Minor
> Fix For: 0.13.2
>
>
> Currently, in many cases, a DRM must be read to compute its dimensions when a 
> user calls nrow or ncol. This also implicitly caches the corresponding DRM.
> In some cases, the user actually knows the matrix dimensions (e.g., when the 
> matrices are synthetically generated, or when some metadata about them is 
> known). In such cases, the user should be able to specify the dimensions upon 
> creating the DRM and the caching should be avoided. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MAHOUT-1786) Make classes implements Serializable for Spark 1.5+

2017-06-22 Thread Trevor Grant (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060396#comment-16060396
 ] 

Trevor Grant commented on MAHOUT-1786:
--

[~dlyubimov] close this issue as "won't fix" ?

> Make classes implements Serializable for Spark 1.5+
> ---
>
> Key: MAHOUT-1786
> URL: https://issues.apache.org/jira/browse/MAHOUT-1786
> Project: Mahout
>  Issue Type: Improvement
>  Components: Math
>Affects Versions: 0.11.0
>Reporter: Michel Lemay
>Assignee: Pat Ferrel
>Priority: Minor
>  Labels: performance
> Fix For: 0.13.2
>
>
> Spark 1.5 comes with a new very efficient serializer that uses code 
> generation.  It is twice as fast as kryo.  When using mahout, we have to set 
> KryoSerializer because some classes aren't serializable otherwise.  
> I suggest to declare Math classes as "implements Serializable" where needed.  
> For instance, to use coocurence package in spark 1.5, we had to modify 
> AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it 
> work without Kryo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1786) Make classes implements Serializable for Spark 1.5+

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1786:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Make classes implements Serializable for Spark 1.5+
> ---
>
> Key: MAHOUT-1786
> URL: https://issues.apache.org/jira/browse/MAHOUT-1786
> Project: Mahout
>  Issue Type: Improvement
>  Components: Math
>Affects Versions: 0.11.0
>Reporter: Michel Lemay
>Assignee: Pat Ferrel
>Priority: Minor
>  Labels: performance
> Fix For: 0.13.2
>
>
> Spark 1.5 comes with a new very efficient serializer that uses code 
> generation.  It is twice as fast as kryo.  When using mahout, we have to set 
> KryoSerializer because some classes aren't serializable otherwise.  
> I suggest to declare Math classes as "implements Serializable" where needed.  
> For instance, to use coocurence package in spark 1.5, we had to modify 
> AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it 
> work without Kryo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (MAHOUT-1953) jars in $MAHOUT_HOME should be deleted on mvn clean

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant resolved MAHOUT-1953.
--
Resolution: Fixed

> jars in $MAHOUT_HOME should be deleted on mvn clean
> ---
>
> Key: MAHOUT-1953
> URL: https://issues.apache.org/jira/browse/MAHOUT-1953
> Project: Mahout
>  Issue Type: Bug
>Reporter: Trevor Grant
> Fix For: 0.13.1
>
>
> MAHOUT-1950 copies jars to $MAHOUT_HOME to be picked up.  
> They should be deleted by mvn clean



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1978) Implement "a provably good seeding method for k-means"

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1978:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Implement "a provably good seeding method for k-means"
> --
>
> Key: MAHOUT-1978
> URL: https://issues.apache.org/jira/browse/MAHOUT-1978
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.13.0
>Reporter: Andrew Palumbo
> Fix For: 0.13.2
>
>
> When Building out our algorithm library, Implement {{ASSUMPTION-FREE K-MC^2}} 
> from 
> https://papers.nips.cc/paper/6478-fast-and-provably-good-seedings-for-k-means.pdf



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1942) Algorithms should be applicable to incore matrices

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1942:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Algorithms should be applicable to incore matrices
> --
>
> Key: MAHOUT-1942
> URL: https://issues.apache.org/jira/browse/MAHOUT-1942
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
> Fix For: 0.13.2
>
>
> The functions in `org.apache.mahout.math.algorithms` should be able to be 
> applied to incore matrices as well.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MAHOUT-1942) Algorithms should be applicable to incore matrices

2017-06-22 Thread Trevor Grant (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060394#comment-16060394
 ] 

Trevor Grant commented on MAHOUT-1942:
--

Move to close this as "won't fix"

It has become evident that majority of algorithms fall into one of two 
categories-

1) Highly parallelizable- an incore version is distributed out via a map/reduce 
function, the distributed version is simply a wrapper

2) A not Highly parallelizable- In which case the optimal implementation for 
the incore and distributed version differ significantly. 

By convention then, in the algorithms framework, models and fitters are 
presumed to be distributed unless `InCore` is the prefix of the class name.


> Algorithms should be applicable to incore matrices
> --
>
> Key: MAHOUT-1942
> URL: https://issues.apache.org/jira/browse/MAHOUT-1942
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
> Fix For: 0.13.2
>
>
> The functions in `org.apache.mahout.math.algorithms` should be able to be 
> applied to incore matrices as well.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1933) Migrate website from CMS to Jekyll

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1933:
-
Component/s: website

> Migrate website from CMS to Jekyll
> --
>
> Key: MAHOUT-1933
> URL: https://issues.apache.org/jira/browse/MAHOUT-1933
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation, website
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
> Fix For: 0.13.2
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1933) Migrate website from CMS to Jekyll

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1933:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Migrate website from CMS to Jekyll
> --
>
> Key: MAHOUT-1933
> URL: https://issues.apache.org/jira/browse/MAHOUT-1933
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation, website
>Affects Versions: 0.13.1
>Reporter: Trevor Grant
> Fix For: 0.13.2
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1932) Interchangeable Solvers

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1932:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Interchangeable Solvers 
> 
>
> Key: MAHOUT-1932
> URL: https://issues.apache.org/jira/browse/MAHOUT-1932
> Project: Mahout
>  Issue Type: Improvement
>  Components: Algorithms
>Affects Versions: 0.13.0
>Reporter: Trevor Grant
> Fix For: 0.13.2
>
>
> Currently all algorithms are solving 'closed' form.  
> It would be good to create open form solvers and optionally invoke them. 
> Two such solvers are Stochastic Gradient Descent, and Genetic Algorithms. 
> An abstract solver trait, and implementations of at least these two solving 
> mechanisms (to avoid bias in choosing what to include/not include in the 
> Solver trait). 
> Refactoring current algorithms to accept optional Solver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1908) Create properties for VCL on mac

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1908:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Create properties for VCL on mac 
> -
>
> Key: MAHOUT-1908
> URL: https://issues.apache.org/jira/browse/MAHOUT-1908
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.13.2
>
>
> Create a set of properties to run OMP on mac- OS X darwin and >. OpenMP is 
> not supported directly by CLANG in mac g++ (in most versions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1909) Cache Modular Backend solvers after probing

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1909:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Cache Modular Backend solvers after probing
> ---
>
> Key: MAHOUT-1909
> URL: https://issues.apache.org/jira/browse/MAHOUT-1909
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.13.0
>Reporter: Andrew Palumbo
> Fix For: 0.13.2
>
>
> Modular Backend solvers are lazily computed Instantiated upon their usage.  
> After their first call, cache the solver in the {{RootSolverFactory}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1914) Spark tests are not picking up OpenCL/OpenMP jars

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1914:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Spark tests are not picking up OpenCL/OpenMP jars
> -
>
> Key: MAHOUT-1914
> URL: https://issues.apache.org/jira/browse/MAHOUT-1914
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.13.2
>
>
> Spark tests are not picking up the correct back-end solver Jars for VCL 
> classes. This is new, and must be fixed before release. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1879) Lazy density analysis of DRMs in CheckpointedDrm

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1879:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Lazy density analysis of DRMs in CheckpointedDrm
> 
>
> Key: MAHOUT-1879
> URL: https://issues.apache.org/jira/browse/MAHOUT-1879
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.13.2
>
>
> Add in a lazy value for density analysis of Checkpointed Drms that can be 
> accessed at Checkpoint Barriers.  
> eg. 
> {code}
> lazy val mxTest: SparseRowMatrix = drmSampleKRows(...)
> lazy val isDense: Boolean = densistyAnalysis(mxTest)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1870) Add import and export capabilities for DRMs to and from Apache Arrow

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1870:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Add import and export capabilities for DRMs to and from Apache Arrow
> 
>
> Key: MAHOUT-1870
> URL: https://issues.apache.org/jira/browse/MAHOUT-1870
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
> Fix For: 0.13.2
>
>
> We need to add the capability to import DRMs from and export DRMs to Apache 
> Arrow.   This will be part of a greater effort to make integration more 
> seamless with other projects. In some cases (eg. exporting to csv or tsv we 
> will allow for a a loss in precision).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1875) Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1875:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Use faster shallowCopy for dense matices in blockify drm/package.blockify(..)
> -
>
> Key: MAHOUT-1875
> URL: https://issues.apache.org/jira/browse/MAHOUT-1875
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
> Fix For: 0.13.2
>
>
> In {{sparkbindings.drm/package.blockify(...)}}, after testing the density of 
> an incoming block, use {{DenseMatrix(blockAsArrayOfDoubles, true)}} to 
> shallow copy the backing vector array into the {{DenseMatrix}}.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1860) Add Stack Image to the top of the front page of the Website

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1860:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Add Stack Image to the top of the front page of the  Website
> 
>
> Key: MAHOUT-1860
> URL: https://issues.apache.org/jira/browse/MAHOUT-1860
> Project: Mahout
>  Issue Type: Documentation
>  Components: website
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.13.2
>
>
> Add a variant of stack.svg - the image of the mahout stack (pg. 64 in the 
> book) "Above the fold" on the site.  This image seems to help people grasp 
> "what mahout is" very quickly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1860) Add Stack Image to the top of the front page of the Website

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1860:
-
Component/s: website

> Add Stack Image to the top of the front page of the  Website
> 
>
> Key: MAHOUT-1860
> URL: https://issues.apache.org/jira/browse/MAHOUT-1860
> Project: Mahout
>  Issue Type: Documentation
>  Components: website
>Affects Versions: 0.12.1
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
> Fix For: 0.13.2
>
>
> Add a variant of stack.svg - the image of the mahout stack (pg. 64 in the 
> book) "Above the fold" on the site.  This image seems to help people grasp 
> "what mahout is" very quickly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1686) Create a documentattion page for ALS

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1686:
-
Component/s: website

> Create a documentattion page for ALS
> 
>
> Key: MAHOUT-1686
> URL: https://issues.apache.org/jira/browse/MAHOUT-1686
> Project: Mahout
>  Issue Type: Documentation
>  Components: website
>Affects Versions: 0.11.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Musselman
> Fix For: 0.13.2
>
>
> Following the template of the SSVD and QR pages create a page for ALS. This 
> Page would go under Algorithms-> Distributed Matrix Decomposition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1686) Create a documentattion page for ALS

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1686:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Create a documentattion page for ALS
> 
>
> Key: MAHOUT-1686
> URL: https://issues.apache.org/jira/browse/MAHOUT-1686
> Project: Mahout
>  Issue Type: Documentation
>Affects Versions: 0.11.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Musselman
> Fix For: 0.13.2
>
>
> Following the template of the SSVD and QR pages create a page for ALS. This 
> Page would go under Algorithms-> Distributed Matrix Decomposition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1904) Create a test harness to test mahout across different hardware configurations

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1904:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Create a test harness to test mahout across different hardware configurations
> -
>
> Key: MAHOUT-1904
> URL: https://issues.apache.org/jira/browse/MAHOUT-1904
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
>Priority: Critical
>  Labels: release, test
> Fix For: 0.13.2
>
>
> Creat a set of simple scala programs to be run as a test harness for Linux 
> amd/intel, mac, and avx2(default).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1873) Use densityAnalysis() in all necessary operations

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1873:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Use densityAnalysis() in all necessary operations
> -
>
> Key: MAHOUT-1873
> URL: https://issues.apache.org/jira/browse/MAHOUT-1873
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
>Priority: Critical
> Fix For: 0.13.2
>
>
> Find all places in which {{densityAnalysis(...)}} can be used to determine 
> ideal matrix structure and implement it.  Eg in {{ABt}}, {{AtB}}, and 
> possibly Kryo serializers.  Ensure when doing this that it is not redundant; 
> Ie. the call is not made by both the Kryo serializer and the distributed 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1892) Can't broadcast vector in Mahout-Shell

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1892:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Can't broadcast vector in Mahout-Shell
> --
>
> Key: MAHOUT-1892
> URL: https://issues.apache.org/jira/browse/MAHOUT-1892
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.12.2
>Reporter: Trevor Grant
>Assignee: Andrew Palumbo
>Priority: Critical
> Fix For: 0.13.2
>
>
> When attempting to broadcast a Vector in Mahout's spark-shell with `mapBlock` 
> we get serialization errors.  **NOTE** scalars can be broadcast without issue.
> I did some testing in the "Zeppelin Shell" for lack of a better term.  See 
> https://github.com/apache/zeppelin/pull/928
> The `mapBlock` same code I ran in the spark-shell below, also generated 
> errors.  However, wrapping a mapBlock into a function in a compiled jar 
> https://github.com/apache/mahout/pull/246/commits/ccb5da65330e394763928f6dc51d96e38debe4fb#diff-4a952e8e09ae07e0b3a7ac6a5d6b2734R25
>  and then running said function from the Mahout Shell or in the "Zeppelin 
> Shell" (using Spark or Flink as a runner) works fine.  
> Consider
> ```
> mahout> val inCoreA = dense((1, 2, 3), (3, 4, 5))
> val A = drmParallelize(inCoreA)
> val v: Vector = dvec(1,1,1)
> val bcastV = drmBroadcast(v)
> val drm2 = A.mapBlock() {
> case (keys, block) =>
> for(row <- 0 until block.nrow) block(row, ::) -= bcastV
> keys -> block
> }
> drm2.checkpoint()
> ```
> Which emits the stack trace:
> ```
> org.apache.spark.SparkException: Task not serializable
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:318)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:317)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
> at org.apache.spark.rdd.RDD.map(RDD.scala:317)
> at 
> org.apache.mahout.sparkbindings.blas.MapBlock$.exec(MapBlock.scala:33)
> at 
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:338)
> at 
> org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:116)
> at 
> org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:41)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:68)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:70)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:72)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:74)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:76)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:78)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:80)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:82)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:84)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:86)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:88)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:90)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:92)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:94)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:96)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:98)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:100)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:102)
> at $iwC$$iwC$$iwC$$iwC$$iwC.(:104)
> at $iwC$$iwC$$iwC$$iwC.(:106)
> at $iwC$$iwC$$iwC.(:108)
> 

[jira] [Updated] (MAHOUT-1882) SequentialAccessSparseVector inerateNonZeros is incorrect.

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1882:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> SequentialAccessSparseVector inerateNonZeros is incorrect.
> --
>
> Key: MAHOUT-1882
> URL: https://issues.apache.org/jira/browse/MAHOUT-1882
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Priority: Critical
> Fix For: 0.13.2
>
>
> In {{SequentialAccessSparseVector}} a bug is noted.  When Cuonting Non-Zero 
> elements {{NonDefaultIterator}} can, under certain circumstances give an 
> incorrect iterator of size different from the actual non-zeroCounts.
> {code}
>  @Override
>   public Iterator iterateNonZero() {
> // TODO: this is a bug, since nonDefaultIterator doesn't hold to non-zero 
> contract.
> return new NonDefaultIterator();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1791) Automatic threading for java based mmul in the front end and the backend.

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1791:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Automatic threading for java based mmul in the front end and the backend.
> -
>
> Key: MAHOUT-1791
> URL: https://issues.apache.org/jira/browse/MAHOUT-1791
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.11.1, 0.12.0, 0.11.2
>Reporter: Dmitriy Lyubimov
>Assignee: Andrew Musselman
>Priority: Critical
> Fix For: 0.13.2
>
>
> As we know, we are still struggling with decisions which path to take for 
> bare metal accelerations in in-core math. 
> Meanwhile, a simple no-brainer improvement though is to add decision paths 
> and apply multithreaded matrix-matrix multiplication (and maybe even others; 
> but mmul perhaps is the most prominent beneficiary here at the moment which 
> is both easy to do and to have a statistically significant improvement) 
> So multithreaded logic addition to mmul is one path. 
> Another path is automatic adjustment of multithreading. 
> In front end, we probably want to utilize all cores available. 
> in the backend, we can oversubscribe cores but probably doing so by more than 
> 2x or 3x is unadvisable because of point of diminishing returns driven by 
> growing likelihood of context switching overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1851) Automatic probing of in-core and back-end solvers

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1851:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Automatic probing of in-core and back-end solvers
> -
>
> Key: MAHOUT-1851
> URL: https://issues.apache.org/jira/browse/MAHOUT-1851
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.12.0
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
>Priority: Critical
> Fix For: 0.13.2
>
>
> In general, as we potentially expand the collection of in-core and 
> distributed solvers relying on particular sw/hw capabilities installed (lib 
> blas, viennacl, cuda), it would be nice to have automatic and centralized 
> capability probing  and some sort of registry/framework that enumerates 
> enabled features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MAHOUT-1918) Use traits when probing VCL

2017-06-22 Thread Trevor Grant (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Grant updated MAHOUT-1918:
-
Fix Version/s: (was: 0.13.1)
   0.13.2

> Use traits when probing VCL
> ---
>
> Key: MAHOUT-1918
> URL: https://issues.apache.org/jira/browse/MAHOUT-1918
> Project: Mahout
>  Issue Type: Test
>Affects Versions: 0.12.2
>Reporter: Andrew Palumbo
>Assignee: Andrew Palumbo
>Priority: Blocker
> Fix For: 0.13.2
>
>
> currently we have
> eg.
> {code} 
>   clazz = 
> Class.forName("org.apache.mahout.viennacl.opencl.GPUMMul$").getField("MODULE$").get(null).asInstanceOf[MMBinaryFunc]
> {code}
> To instantiate a Solver.. It is being cast to a {{MMBinaryFunc}}
> cast this to at MMulSolver Trait and change the corresponding class GPUMMul 
> to extend this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Density based Clustering in Mahout

2017-06-22 Thread Aditya
Hello everyone,

I've been working for the past few weeks on coding an in-core DBSCAN
algorithm.

A more efficient version with an O(n*log(n)) complexity does exist but it
uses the R-Tree data structure to index the data. I have a few
concerns/questions and I'm hoping you would be able to help me out.

1. Based on my knowledge, using an indexing data structure like an R-Tree
or a Kd-Tree is the only way to reduce the complexity of the dbscan
algorithm. If there's any other method that you are familiar with, please
let me know.

2. From what I've read in the book Apache Mahout: Beyond MapReduce written
by Andrew and Dmitry, I don't see how I can directly exploit the existing
data structures and operations to get the functionality of an R-Tree.

3. On the off chance that an R-Tree module has to built in Mahout, because
it is not possible to easily use existing operations I need some insights
as to how to go about it. I learned that everything in Mahout at the end
should be serializable to a vector. I can't fathom how to do that
intuitively in the case of an R-Tree

There are a couple of other concerns that need to be discussed but these
are vital at the moment.

PS: The research paper that proposed the DBSCAN algorithm can be found here
.

Thanks!

-Aditya


Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
and contributors convenience should be golden IMO. I remember experiencing
a mild irritation when i was asked to resolve the conflicts on spark prs
because I felt they arose solely because the committer was taking too long
to review my PR and ok it. But if it were resulting from the project not
following simple KISS github PR workflow, it probably would be a bigger
turn-off.

and then imagine the overhead of explaining to every newcomer that they
should and why they should be PRing not against the master but something
else when every other ASF project accepts PRs against master...

I dunno... when working on github, any deviation from github commonly
accepted PR flows imo would be a fatal wound to the process.

On Thu, Jun 22, 2017 at 4:13 PM, Dmitriy Lyubimov  wrote:

> should read
>
> And then you will face the dilemma whether to ask people to resolve merge
> issues w.r.t. *dev* and resubmit against *dev*, which will result to high
> contribtors' attrition, or resolve them yourself without deep knowledge of
> the author's intent, which will result in delays and plain errors.
>
> On Thu, Jun 22, 2017 at 2:48 PM, Dmitriy Lyubimov 
> wrote:
>
>>
>>
>> On Wed, Jun 21, 2017 at 3:00 PM, Pat Ferrel 
>> wrote:
>>
>>> Which is an option part of git flow but maybe take a look at a better
>>> explanation than mine: http://nvie.com/posts/a-succes
>>> sful-git-branching-model/ >> ssful-git-branching-model/>
>>>
>>> I still don’t see how this complicates resolving conflicts. It just
>>> removes the resolution from being a blocker. If some conflict is pushed to
>>> master the project is dead until it is resolved (how often have we seen
>>> this?)
>>
>>
>> This is completely detached from github reality.
>>
>> In this model, all contributors work actually on the same branch. In
>> github, every contributor will fork off their own dev branch.
>>
>> In this model, people start with a fork off the dev branch and push to
>> dev branch. In github, a contributor will fork off the master branch and
>> will PR against master branch. This is default behavior and my gut feeling
>> no amount of forewarning is going to change that w.r.t. contributors. And
>> if one starts off his/her work with the branch with intent to commit to
>> another, then conflict is guaranteed every time he or she changes the file
>> that has been changed on the branch to be merged to.
>>
>> For example:
>> Master is at A
>> Dev branch is at A - B -C ... F.
>>
>> if I start working at master (A) then i wil generate conflicts if i have
>> changed same files (lines) as in B, C, .. or F.
>>
>> If I start working at dev (F) then i will not have a chance to generate
>> conflicts with B,C,..F but only with commits that happened after i had
>> started.
>>
>> Also, if I start working at master (A) then github flow will suggest me
>> to merge into master during PR. I guarantee 100% of  first time PRs will
>> trip on that in github. even if you put "start your work off dev not
>> master" 20 times into project readme.
>>
>> And then you will face the dilemma whether to ask people to resolve merge
>> issues w.r.t. master and resubmit, which will result to high contribtors'
>> attrition, or resolve them yourself without deep knowledge of the author's
>> intent, which will result in delays and plain errors.
>>
>> -d
>>
>
>


Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
should read

And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. *dev* and resubmit against *dev*, which will result to high
contribtors' attrition, or resolve them yourself without deep knowledge of
the author's intent, which will result in delays and plain errors.

On Thu, Jun 22, 2017 at 2:48 PM, Dmitriy Lyubimov  wrote:

>
>
> On Wed, Jun 21, 2017 at 3:00 PM, Pat Ferrel  wrote:
>
>> Which is an option part of git flow but maybe take a look at a better
>> explanation than mine: http://nvie.com/posts/a-succes
>> sful-git-branching-model/ > ssful-git-branching-model/>
>>
>> I still don’t see how this complicates resolving conflicts. It just
>> removes the resolution from being a blocker. If some conflict is pushed to
>> master the project is dead until it is resolved (how often have we seen
>> this?)
>
>
> This is completely detached from github reality.
>
> In this model, all contributors work actually on the same branch. In
> github, every contributor will fork off their own dev branch.
>
> In this model, people start with a fork off the dev branch and push to dev
> branch. In github, a contributor will fork off the master branch and will
> PR against master branch. This is default behavior and my gut feeling no
> amount of forewarning is going to change that w.r.t. contributors. And if
> one starts off his/her work with the branch with intent to commit to
> another, then conflict is guaranteed every time he or she changes the file
> that has been changed on the branch to be merged to.
>
> For example:
> Master is at A
> Dev branch is at A - B -C ... F.
>
> if I start working at master (A) then i wil generate conflicts if i have
> changed same files (lines) as in B, C, .. or F.
>
> If I start working at dev (F) then i will not have a chance to generate
> conflicts with B,C,..F but only with commits that happened after i had
> started.
>
> Also, if I start working at master (A) then github flow will suggest me to
> merge into master during PR. I guarantee 100% of  first time PRs will trip
> on that in github. even if you put "start your work off dev not master" 20
> times into project readme.
>
> And then you will face the dilemma whether to ask people to resolve merge
> issues w.r.t. master and resubmit, which will result to high contribtors'
> attrition, or resolve them yourself without deep knowledge of the author's
> intent, which will result in delays and plain errors.
>
> -d
>


Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
On Wed, Jun 21, 2017 at 3:00 PM, Pat Ferrel  wrote:

> Which is an option part of git flow but maybe take a look at a better
> explanation than mine: http://nvie.com/posts/a-successful-git-branching-
> model/ 
>
> I still don’t see how this complicates resolving conflicts. It just
> removes the resolution from being a blocker. If some conflict is pushed to
> master the project is dead until it is resolved (how often have we seen
> this?)


This is completely detached from github reality.

In this model, all contributors work actually on the same branch. In
github, every contributor will fork off their own dev branch.

In this model, people start with a fork off the dev branch and push to dev
branch. In github, a contributor will fork off the master branch and will
PR against master branch. This is default behavior and my gut feeling no
amount of forewarning is going to change that w.r.t. contributors. And if
one starts off his/her work with the branch with intent to commit to
another, then conflict is guaranteed every time he or she changes the file
that has been changed on the branch to be merged to.

For example:
Master is at A
Dev branch is at A - B -C ... F.

if I start working at master (A) then i wil generate conflicts if i have
changed same files (lines) as in B, C, .. or F.

If I start working at dev (F) then i will not have a chance to generate
conflicts with B,C,..F but only with commits that happened after i had
started.

Also, if I start working at master (A) then github flow will suggest me to
merge into master during PR. I guarantee 100% of  first time PRs will trip
on that in github. even if you put "start your work off dev not master" 20
times into project readme.

And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. master and resubmit, which will result to high contribtors'
attrition, or resolve them yourself without deep knowledge of the author's
intent, which will result in delays and plain errors.

-d


Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Pat Ferrel
And all this leads me to think that the concerns/worries may not really be 
warranted, this process just codifies best practices and adds one new thing, 
which is “develop’ as the default WIP branch.


On Jun 22, 2017, at 10:47 AM, Pat Ferrel  wrote:

Which translates into exactly what you suggest if we are maintaining release 
branches.


On Jun 22, 2017, at 10:45 AM, Pat Ferrel  wrote:

Actually I think git flow would merge it into master and tag it with an 
annotated tag like “0.13.0.jira-123” to reference the bug fix or some other 
naming scheme. Since the bug is “important” it is treated like what the blog 
post calls a “hotfix” so the head of master is still stable with hotfixes 
applied even if the merge does not warrant a binary release.

The master branch hygiene is maintained by checking WIP into develop or a 
feature branch, hotfixes and releases go into master. There is also a mechanism 
to maintain release branches if the project warrants, which may be true of 
Mahout.


On Jun 21, 2017, at 3:25 PM, Trevor Grant  wrote:

So right now, if there was a bug in 0.13.0 that needed an important patch-
why not just merge it into master and  git branch "branch-0.13.0"

On Wed, Jun 21, 2017 at 4:26 PM, Dmitriy Lyubimov  wrote:

> PS. but i see the rational. to have stable fixes to get into release.
> perhaps named release branches is still a way to go if one cuts them early
> enough.
> 
> On Wed, Jun 21, 2017 at 2:25 PM, Dmitriy Lyubimov 
> wrote:
> 
>> 
>> 
>> On Wed, Jun 21, 2017 at 2:17 PM, Pat Ferrel 
> wrote:
>> 
>> Since merges are done by committers, it’s easy to retarget a
> contributor’s
>>> PRs but committers would PR against develop,
>> 
>> IMO it is anything but easy to resolve conflicts, let alone somebody
>> else's. Spark just asks me to resolve them myself. But if you don't have
>> proper target, you can't ask the contributor.
>> 
>> and some projects like PredictionIO make develop the default branch on
>>> github so it's the one contributors get by default.
>>> 
>> That would fix it but i am not sure if we have access to HEAD on github
>> mirror. Might involve INFRA to do it  And in that case  it would amount
>> little more but renaming. It would seem it is much easier to create a
>> branch, "stable master" or something, and consider master to be ongoing
> PR
>> base.
>> 
>> -1 on former, -0 on the latter. Judging from the point of both
> contributor
>> and committer (of which I am both).it will not make my life easy on
> either
>> end.
>> 
>> 
> 





Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Pat Ferrel
Which translates into exactly what you suggest if we are maintaining release 
branches.


On Jun 22, 2017, at 10:45 AM, Pat Ferrel  wrote:

Actually I think git flow would merge it into master and tag it with an 
annotated tag like “0.13.0.jira-123” to reference the bug fix or some other 
naming scheme. Since the bug is “important” it is treated like what the blog 
post calls a “hotfix” so the head of master is still stable with hotfixes 
applied even if the merge does not warrant a binary release.

The master branch hygiene is maintained by checking WIP into develop or a 
feature branch, hotfixes and releases go into master. There is also a mechanism 
to maintain release branches if the project warrants, which may be true of 
Mahout.


On Jun 21, 2017, at 3:25 PM, Trevor Grant  wrote:

So right now, if there was a bug in 0.13.0 that needed an important patch-
why not just merge it into master and  git branch "branch-0.13.0"

On Wed, Jun 21, 2017 at 4:26 PM, Dmitriy Lyubimov  wrote:

> PS. but i see the rational. to have stable fixes to get into release.
> perhaps named release branches is still a way to go if one cuts them early
> enough.
> 
> On Wed, Jun 21, 2017 at 2:25 PM, Dmitriy Lyubimov 
> wrote:
> 
>> 
>> 
>> On Wed, Jun 21, 2017 at 2:17 PM, Pat Ferrel 
> wrote:
>> 
>> Since merges are done by committers, it’s easy to retarget a
> contributor’s
>>> PRs but committers would PR against develop,
>> 
>> IMO it is anything but easy to resolve conflicts, let alone somebody
>> else's. Spark just asks me to resolve them myself. But if you don't have
>> proper target, you can't ask the contributor.
>> 
>> and some projects like PredictionIO make develop the default branch on
>>> github so it's the one contributors get by default.
>>> 
>> That would fix it but i am not sure if we have access to HEAD on github
>> mirror. Might involve INFRA to do it  And in that case  it would amount
>> little more but renaming. It would seem it is much easier to create a
>> branch, "stable master" or something, and consider master to be ongoing
> PR
>> base.
>> 
>> -1 on former, -0 on the latter. Judging from the point of both
> contributor
>> and committer (of which I am both).it will not make my life easy on
> either
>> end.
>> 
>> 
> 




Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Pat Ferrel
Actually I think git flow would merge it into master and tag it with an 
annotated tag like “0.13.0.jira-123” to reference the bug fix or some other 
naming scheme. Since the bug is “important” it is treated like what the blog 
post calls a “hotfix” so the head of master is still stable with hotfixes 
applied even if the merge does not warrant a binary release.

The master branch hygiene is maintained by checking WIP into develop or a 
feature branch, hotfixes and releases go into master. There is also a mechanism 
to maintain release branches if the project warrants, which may be true of 
Mahout.


On Jun 21, 2017, at 3:25 PM, Trevor Grant  wrote:

So right now, if there was a bug in 0.13.0 that needed an important patch-
why not just merge it into master and  git branch "branch-0.13.0"

On Wed, Jun 21, 2017 at 4:26 PM, Dmitriy Lyubimov  wrote:

> PS. but i see the rational. to have stable fixes to get into release.
> perhaps named release branches is still a way to go if one cuts them early
> enough.
> 
> On Wed, Jun 21, 2017 at 2:25 PM, Dmitriy Lyubimov 
> wrote:
> 
>> 
>> 
>> On Wed, Jun 21, 2017 at 2:17 PM, Pat Ferrel 
> wrote:
>> 
>> Since merges are done by committers, it’s easy to retarget a
> contributor’s
>>> PRs but committers would PR against develop,
>> 
>> IMO it is anything but easy to resolve conflicts, let alone somebody
>> else's. Spark just asks me to resolve them myself. But if you don't have
>> proper target, you can't ask the contributor.
>> 
>> and some projects like PredictionIO make develop the default branch on
>>> github so it's the one contributors get by default.
>>> 
>> That would fix it but i am not sure if we have access to HEAD on github
>> mirror. Might involve INFRA to do it  And in that case  it would amount
>> little more but renaming. It would seem it is much easier to create a
>> branch, "stable master" or something, and consider master to be ongoing
> PR
>> base.
>> 
>> -1 on former, -0 on the latter. Judging from the point of both
> contributor
>> and committer (of which I am both).it will not make my life easy on
> either
>> end.
>> 
>> 
> 



[jira] [Commented] (MAHOUT-1988) scala 2.10 is hardcoded somewhere

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059620#comment-16059620
 ] 

ASF GitHub Bot commented on MAHOUT-1988:


Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/326
  
fwiw- this doesn't actually fix the dependency issue. I think I'll make a 
new jira ticket for this, then down grade mahout-1988 to trivial



>  scala 2.10 is hardcoded somewhere
> --
>
> Key: MAHOUT-1988
> URL: https://issues.apache.org/jira/browse/MAHOUT-1988
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Andrew Palumbo
>Assignee: Trevor Grant
>Priority: Blocker
> Fix For: 0.13.1
>
>
> After building mahout against scala 2.11: 
> {code}
> mvn clean install -Dscala.version=2.11.4 -Dscala.compat.version=2.11 
> -Phadoop2  -DskipTests
> {code}
> ViennaCL jars are built hard-coded to scala 2.10.  This is currently blocking 
> the 0.13.1 release. 
> {code}
> mahout-h2o_2.11-0.13.1-SNAPSHOT.jar
> mahout-hdfs-0.13.1-SNAPSHOT.jar
> mahout-math-0.13.1-SNAPSHOT.jar
> mahout-math-scala_2.11-0.13.1-SNAPSHOT.jar
> mahout-mr-0.13.1-SNAPSHOT.jar
> mahout-native-cuda_2.10-0.13.0-SNAPSHOT.jar
> mahout-native-cuda_2.10-0.13.1-SNAPSHOT.jar
> mahout-native-viennacl_2.10-0.13.1-SNAPSHOT.jar
> mahout-native-viennacl-omp_2.10-0.13.1-SNAPSHOT.jar
> mahout-spark_2.11-0.13.1-SNAPSHOT-dependency-reduced.jar
> mahout-spark_2.11-0.13.1-SNAPSHOT.jar
> {code} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)