[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14384


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75440356
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,146 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param ratingCol column name for ratings.
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers.
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers.
+#' @param rank rank of the matrix factorization (> 0).
+#' @param reg regularization parameter (>= 0).
+#' @param maxIter maximum number of iterations (>= 0).
+#' @param nonnegative logical value indicating whether to apply 
nonnegativity constraints.
+#' @param implicitPrefs logical value indicating whether to use implicit 
preference.
+#' @param alpha alpha parameter in the implicit preference formulation (>= 
0).
+#' @param seed integer seed for random number generation.
+#' @param numUserBlocks number of user blocks used to parallelize 
computation (> 0).
+#' @param numItemBlocks number of item blocks used to parallelize 
computation (> 0).
+#' @param checkpointInterval number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame-method
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, nonnegative = FALSE,
+   implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, 
numItemBlocks = 10,
--- End diff --

It is - you are correct, it just sort of more clear. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75408767
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,146 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param ratingCol column name for ratings.
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers.
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers.
+#' @param rank rank of the matrix factorization (> 0).
+#' @param reg regularization parameter (>= 0).
+#' @param maxIter maximum number of iterations (>= 0).
+#' @param nonnegative logical value indicating whether to apply 
nonnegativity constraints.
+#' @param implicitPrefs logical value indicating whether to use implicit 
preference.
+#' @param alpha alpha parameter in the implicit preference formulation (>= 
0).
+#' @param seed integer seed for random number generation.
+#' @param numUserBlocks number of user blocks used to parallelize 
computation (> 0).
+#' @param numItemBlocks number of item blocks used to parallelize 
computation (> 0).
+#' @param checkpointInterval number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame-method
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, nonnegative = FALSE,
+   implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, 
numItemBlocks = 10,
--- End diff --

In fact it doesn't matter? I think R default is double type, but it's more 
clear to differentiate from other integer parameters. Done thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75290436
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,146 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param ratingCol column name for ratings.
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers.
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers.
+#' @param rank rank of the matrix factorization (> 0).
+#' @param reg regularization parameter (>= 0).
+#' @param maxIter maximum number of iterations (>= 0).
+#' @param nonnegative logical value indicating whether to apply 
nonnegativity constraints.
+#' @param implicitPrefs logical value indicating whether to use implicit 
preference.
+#' @param alpha alpha parameter in the implicit preference formulation (>= 
0).
+#' @param seed integer seed for random number generation.
+#' @param numUserBlocks number of user blocks used to parallelize 
computation (> 0).
+#' @param numItemBlocks number of item blocks used to parallelize 
computation (> 0).
+#' @param checkpointInterval number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame-method
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, nonnegative = FALSE,
+   implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, 
numItemBlocks = 10,
--- End diff --

should `alpha = 1.0`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75273184
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -454,4 +454,61 @@ test_that("spark.survreg", {
   }
 })
 
+test_that("spark.als", {
+  # R code to reproduce the result.
+  #
+  #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+  #'  list(2, 1, 1.0), list(2, 2, 5.0))
+  #' df <- createDataFrame(data, c("user", "item", "rating"))
+  #' model <- spark.als(df, ratingCol = "rating", userCol = "user", 
itemCol = "item",
+  #'rank = 10, maxIter = 5, seed = 0)
+  #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), 
c("user", "item"))
+  #' predict(model, test)
+  #
+  # -- output of 'predict(model, data)'
+  #
+  # user item   prediction
+  #   0 2   -0.1380762
+  #   1 0   2.6258414
+  #   2 0   -1.5018409
--- End diff --

Yeah that makes sense. It seems `spark.survreg` and `spark.naiveBayes` also 
have such comment blocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75272624
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
+val rRatingCol: String,
+val rUserCol: String,
+val rItemCol: String,
+val rRegParam: Double,
+val rMaxIter: Int) extends MLWritable {
+
+  lazy val rUserFactors: DataFrame = alsm.userFactors
+
+  lazy val rItemFactors: DataFrame = alsm.itemFactors
+
+  lazy val rRank: Int = alsm.rank
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+alsm.transform(dataset)
+  }
+
+  override def write: MLWriter = new ALSWrapper.ALSWrapperWriter(this)
+}
+
+private[r] object ALSWrapper extends MLReadable[ALSWrapper] {
+
+  def fit(data: DataFrame, features: Array[String], rank: Int, regParam: 
Double, maxIter: Int,
+  implicitPrefs: Boolean, alpha: Double, nonnegative: Boolean,
+  distParams: Array[Int]): ALSWrapper = {
--- End diff --

Yeah, it was intended to avoid style check. Have corrected it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75272474
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
+val rRatingCol: String,
--- End diff --

I followed the naming in `GeneralizedLinearRegressionWrapper`. It seems 
that we don't actually that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-17 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75233142
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
--- End diff --

Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75181798
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -454,4 +454,61 @@ test_that("spark.survreg", {
   }
 })
 
+test_that("spark.als", {
+  # R code to reproduce the result.
+  #
+  #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+  #'  list(2, 1, 1.0), list(2, 2, 5.0))
+  #' df <- createDataFrame(data, c("user", "item", "rating"))
+  #' model <- spark.als(df, ratingCol = "rating", userCol = "user", 
itemCol = "item",
+  #'rank = 10, maxIter = 5, seed = 0)
+  #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), 
c("user", "item"))
+  #' predict(model, test)
+  #
+  # -- output of 'predict(model, data)'
+  #
+  # user item   prediction
+  #   0 2   -0.1380762
+  #   1 0   2.6258414
+  #   2 0   -1.5018409
--- End diff --

It is not helpful if the code to produce the expect answer is the same as 
the test code. We can either use Scala code or simply remove this block of 
comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75179320
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
+val rRatingCol: String,
+val rUserCol: String,
+val rItemCol: String,
+val rRegParam: Double,
+val rMaxIter: Int) extends MLWritable {
+
+  lazy val rUserFactors: DataFrame = alsm.userFactors
+
+  lazy val rItemFactors: DataFrame = alsm.itemFactors
+
+  lazy val rRank: Int = alsm.rank
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+alsm.transform(dataset)
+  }
+
+  override def write: MLWriter = new ALSWrapper.ALSWrapperWriter(this)
+}
+
+private[r] object ALSWrapper extends MLReadable[ALSWrapper] {
+
+  def fit(data: DataFrame, features: Array[String], rank: Int, regParam: 
Double, maxIter: Int,
+  implicitPrefs: Boolean, alpha: Double, nonnegative: Boolean,
+  distParams: Array[Int]): ALSWrapper = {
--- End diff --

Why not using explicit params? If this is to avoid style check, you can 
turn it off by `// stylecheck:off`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75179030
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
+val rRatingCol: String,
+val rUserCol: String,
+val rItemCol: String,
+val rRegParam: Double,
+val rMaxIter: Int) extends MLWritable {
+
+  lazy val rUserFactors: DataFrame = alsm.userFactors
+
+  lazy val rItemFactors: DataFrame = alsm.itemFactors
+
+  lazy val rRank: Int = alsm.rank
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+alsm.transform(dataset)
+  }
+
+  override def write: MLWriter = new ALSWrapper.ALSWrapperWriter(this)
+}
+
+private[r] object ALSWrapper extends MLReadable[ALSWrapper] {
+
+  def fit(data: DataFrame, features: Array[String], rank: Int, regParam: 
Double, maxIter: Int,
--- End diff --

Chop down the arguments and use 4-space indentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75178881
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
+val rRatingCol: String,
--- End diff --

Btw, can we get those metadata from the model object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75178780
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
--- End diff --

`alsModel`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75178434
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.recommendation.{ALS, ALSModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class ALSWrapper private (
+val alsm: ALSModel,
+val rRatingCol: String,
--- End diff --

Is it the same as `ratingCol`? Then why do we need the `r` prefix? cc: 
@yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-16 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r75004906
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -454,4 +454,61 @@ test_that("spark.survreg", {
   }
 })
 
+test_that("spark.als", {
+  # R code to reproduce the result.
+  #
+  #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+  #'  list(2, 1, 1.0), list(2, 2, 5.0))
+  #' df <- createDataFrame(data, c("user", "item", "rating"))
+  #' model <- spark.als(df, ratingCol = "rating", userCol = "user", 
itemCol = "item",
+  #'rank = 10, maxIter = 5, seed = 0)
+  #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), 
c("user", "item"))
+  #' predict(model, test)
+  #
+  # -- output of 'predict(model, data)'
+  #
+  # user item   prediction
--- End diff --

I think the usage exposed in this example has mostly been covered by the 
existing examples.   Anything specific in mind?

The algorithm does not guarantee non-negativeness unless specified in the 
arguments. A short answer would be a low predicted rating, if the ratings in 
the training data are all nonnegative. In fact, if no constraints put, the 
range of the predicted rating could be all real numbers. An alternative way is 
to use another function to map the value back to the desired region (e.g. 0-5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-16 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74999004
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
--- End diff --

Sounds good to me. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74868483
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+#' @param ... additional named argument(s) such as \code{nonnegative}.
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+if (!is.numeric(rank) || rank <= 0) {
+  stop("rank should be a positive number.")
+}
+if (!is.numeric(reg) || reg < 0) {
+  stop("reg should be a nonnegative number.")
+}
+if (!is.numeric(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a positive number.")
+}
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
+
+args <- list(...)
+numUserBlocks <- args$numUserBlocks %||% 10
+numItemBlocks <- args$numItemBlocks %||% 10
+implicitPrefs <- args$implicitPrefs %||% FALSE
+alpha <- args$alpha %||% 1.0
+nonnegative <- args$nonnegative %||% FALSE
+checkpointInterval <- args$checkpointInterval %||% 10
+seed <- args$seed %||% 0
+
+features <- array(c(ratingCol, userCol, itemCol))
+distParams <- array(as.integer(c(numUserBlocks, numItemBlocks,
+  

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74868437
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+#' @param ... additional named argument(s) such as \code{nonnegative}.
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+if (!is.numeric(rank) || rank <= 0) {
+  stop("rank should be a positive number.")
+}
+if (!is.numeric(reg) || reg < 0) {
+  stop("reg should be a nonnegative number.")
+}
+if (!is.numeric(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a positive number.")
+}
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
+
+args <- list(...)
+numUserBlocks <- args$numUserBlocks %||% 10
+numItemBlocks <- args$numItemBlocks %||% 10
+implicitPrefs <- args$implicitPrefs %||% FALSE
+alpha <- args$alpha %||% 1.0
+nonnegative <- args$nonnegative %||% FALSE
+checkpointInterval <- args$checkpointInterval %||% 10
+seed <- args$seed %||% 0
+
+features <- array(c(ratingCol, userCol, itemCol))
+distParams <- array(as.integer(c(numUserBlocks, numItemBlocks,
+  

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74867735
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+#' @param ... additional named argument(s) such as \code{nonnegative}.
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+if (!is.numeric(rank) || rank <= 0) {
+  stop("rank should be a positive number.")
+}
+if (!is.numeric(reg) || reg < 0) {
+  stop("reg should be a nonnegative number.")
+}
+if (!is.numeric(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a positive number.")
+}
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
+
+args <- list(...)
+numUserBlocks <- args$numUserBlocks %||% 10
+numItemBlocks <- args$numItemBlocks %||% 10
+implicitPrefs <- args$implicitPrefs %||% FALSE
+alpha <- args$alpha %||% 1.0
+nonnegative <- args$nonnegative %||% FALSE
+checkpointInterval <- args$checkpointInterval %||% 10
+seed <- args$seed %||% 0
+
+features <- array(c(ratingCol, userCol, itemCol))
+distParams <- array(as.integer(c(numUserBlocks, numItemBlocks,
+  

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74867708
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+#' @param ... additional named argument(s) such as \code{nonnegative}.
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
--- End diff --

seems like this should be `#' @aliases spark.als,SparkDataFrame-method`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74674616
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
--- End diff --

I see.  I think it's ok though. R ALS has 16 parameters - 
https://cran.r-project.org/web/packages/ALS/ALS.pdf

Since most parameters have default values people would just omit them. I 
think it is better to be explicit because otherwise it is easier to make 
mistakes on types (eg. c() vs list()), typos and so on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74527111
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
--- End diff --

remove empty newline in roxygen2 block


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74528252
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+if (!is.numeric(rank) || rank <= 0) {
+  stop("rank should be a positive number.")
+}
+if (!is.numeric(reg) || reg < 0) {
+  stop("reg should be a nonnegative number.")
+}
+if (!is.numeric(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a positive number.")
+}
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
+
+args <- list(...)
+numUserBlocks <- args$numUserBlocks %||% 10
+numItemBlocks <- args$numItemBlocks %||% 10
+implicitPrefs <- args$implicitPrefs %||% FALSE
+alpha <- args$alpha %||% 1.0
+nonnegative <- args$nonnegative %||% FALSE
+checkpointInterval <- args$checkpointInterval %||% 10
+seed <- args$seed %||% 0
+
+features <- array(c(ratingCol, userCol, itemCol))
+distParams <- array(as.integer(c(numUserBlocks, numItemBlocks,
+ checkpointInterval, seed)))
+
+

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-12 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74537542
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
--- End diff --

I was thinking those parameters might be not as important as the ones in 
the list, and don't want the user to feel there are so many variables to tune 
for the algorithm?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-12 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74551161
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -454,4 +454,61 @@ test_that("spark.survreg", {
   }
 })
 
+test_that("spark.als", {
+  # R code to reproduce the result.
+  #
+  #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+  #'  list(2, 1, 1.0), list(2, 2, 5.0))
+  #' df <- createDataFrame(data, c("user", "item", "rating"))
+  #' model <- spark.als(df, ratingCol = "rating", userCol = "user", 
itemCol = "item",
+  #'rank = 10, maxIter = 5, seed = 0)
+  #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), 
c("user", "item"))
+  #' predict(model, test)
+  #
+  # -- output of 'predict(model, data)'
+  #
+  # user item   prediction
--- End diff --

It would be good to have these examples also in method documentation.
Also, how do you interpret negatively predicted rating ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-11 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74536489
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
+#' list(2, 1, 1.0), list(2, 2, 5.0))
+#' df <- createDataFrame(ratings, c("user", "item", "rating"))
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+if (!is.numeric(rank) || rank <= 0) {
+  stop("rank should be a positive number.")
+}
+if (!is.numeric(reg) || reg < 0) {
+  stop("reg should be a nonnegative number.")
+}
+if (!is.numeric(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a positive number.")
+}
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
+
+args <- list(...)
+numUserBlocks <- args$numUserBlocks %||% 10
+numItemBlocks <- args$numItemBlocks %||% 10
+implicitPrefs <- args$implicitPrefs %||% FALSE
+alpha <- args$alpha %||% 1.0
+nonnegative <- args$nonnegative %||% FALSE
+checkpointInterval <- args$checkpointInterval %||% 10
+seed <- args$seed %||% 0
+
+features <- array(c(ratingCol, userCol, itemCol))
+distParams <- array(as.integer(c(numUserBlocks, numItemBlocks,
+ checkpointInterval, seed)))
+
+jobj 

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74527784
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
--- End diff --

Is there a reason we are preferring `...` vs naming these out like 
`maxIter` in the function definition on L714? if it's well known it's probably 
better to name them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74527216
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,159 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
--- End diff --

please add documentation for `...` as for example `@param ... additional 
name arguments such as nonnegative`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74116056
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,147 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(ratings)
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
+
+args <- list(...)
+numUserBlocks <- args$numUserBlocks %||% 10
+numItemBlocks <- args$numItemBlocks %||% 10
+implicitPrefs <- args$implicitPrefs %||% FALSE
+alpha <- args$alpha %||% 1.0
+nonnegative <- args$nonnegative %||% FALSE
+checkpointInterval <- args$checkpointInterval %||% 10
+seed <- args$seed %||% 0
+
+features <- array(c(ratingCol, userCol, itemCol))
+distParams <- array(as.integer(c(numUserBlocks, numItemBlocks,
+ checkpointInterval, seed)))
+
+jobj <- callJStatic("org.apache.spark.ml.r.ALSWrapper",
+"fit", data@sdf, features, 
as.integer(rank),
+reg, as.integer(maxIter), implicitPrefs, 
alpha, nonnegative,
--- End diff --

Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74112930
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,147 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(ratings)
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
--- End diff --

In this case (since we set 7 default values) the code would be a little 
repetitive if we expand every one of them. I guess this definition would not 
have side effect to the other functions in SparkR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74009550
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,147 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(ratings)
--- End diff --

Good point. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-08 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r73999377
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,147 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(ratings)
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
+
+args <- list(...)
+numUserBlocks <- args$numUserBlocks %||% 10
+numItemBlocks <- args$numItemBlocks %||% 10
+implicitPrefs <- args$implicitPrefs %||% FALSE
+alpha <- args$alpha %||% 1.0
+nonnegative <- args$nonnegative %||% FALSE
+checkpointInterval <- args$checkpointInterval %||% 10
+seed <- args$seed %||% 0
+
+features <- array(c(ratingCol, userCol, itemCol))
+distParams <- array(as.integer(c(numUserBlocks, numItemBlocks,
+ checkpointInterval, seed)))
+
+jobj <- callJStatic("org.apache.spark.ml.r.ALSWrapper",
+"fit", data@sdf, features, 
as.integer(rank),
+reg, as.integer(maxIter), implicitPrefs, 
alpha, nonnegative,
--- End diff --

here, it doesn't check if `rank` or `maxIter` is positive. Maybe some 
checks on R-side would be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-08 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r73999041
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,147 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(ratings)
+#' model <- spark.als(df, "rating", "user", "item")
+#'
+#' # extract latent factors
+#' stats <- summary(model)
+#' userFactors <- stats$userFactors
+#' itemFactors <- stats$itemFactors
+#'
+#' # make predictions
+#' predicted <- predict(model, df)
+#' showDF(predicted)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#'
+#' # set other arguments
+#' modelS <- spark.als(df, "rating", "user", "item", rank = 20,
+#' reg = 0.1, nonnegative = TRUE)
+#' statsS <- summary(modelS)
+#' }
+#' @note spark.als since 2.1.0
+setMethod("spark.als", signature(data = "SparkDataFrame"),
+  function(data, ratingCol = "rating", userCol = "user", itemCol = 
"item",
+   rank = 10, reg = 1.0, maxIter = 10, ...) {
+
+`%||%` <- function(a, b) if (!is.null(a)) a else b
--- End diff --

Usually, in SparkR we haven't introduced such operators to solve this type 
of problems. Maybe it would be good to follow the convention...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-08 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r73998522
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,147 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(ratings)
--- End diff --

@junyangq, I think that it would be good if you show in examples how you 
create the `ratings` data.frame. Does R have a build-in ratings dataset ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-07-29 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r72880604
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -454,4 +454,48 @@ test_that("spark.survreg", {
   }
 })
 
+test_that("spark.als", {
+  # R code to reproduce the result.
+  #
+  #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), 
list(1, 2, 4.0),
--- End diff --

for code in comment, try
```
# nolint start
# code
# nolint end
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-07-29 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r72880581
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,137 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints}
--- End diff --

could you add code example using these additional arguments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-07-29 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r72880583
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,137 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints}
--- End diff --

could you add code example using these additional arguments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-07-29 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r72880568
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,137 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0)}
+#'\item{seed}{seed for random number generation}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0)}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0)}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1)}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be cast 
into) integers
--- End diff --

nit: cast is a foreign concept in R. perhaps `coerce`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-07-29 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r72880558
  
--- Diff: R/pkg/R/mllib.R ---
@@ -61,7 +68,7 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @name write.ml
 #' @export
 #' @seealso \link{spark.glm}, \link{glm}
-#' @seealso \link{spark.kmeans}, \link{spark.naiveBayes}, 
\link{spark.survreg}
+#' @seealso \link{spark.kmeans}, \link{spark.naiveBayes}, 
\link{spark.survreg}, \link{spark.als}
--- End diff --

add to front since this is "als"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-07-27 Thread junyangq
GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14384

[Spark-16443][SparkR] Alternating Least Squares (ALS) wrapper

## What changes were proposed in this pull request?

Add Alternating Least Squares wrapper in SparkR. Unit tests have been 
updated.

## How was this patch tested?

SparkR unit tests.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

![screen shot 2016-07-27 at 3 50 46 
pm](https://cloud.githubusercontent.com/assets/15318264/17195348/f7a7d452-5411-11e6-845f-6d292283bc28.png)
![screen shot 2016-07-27 at 3 50 31 
pm](https://cloud.githubusercontent.com/assets/15318264/17195347/f7a6352a-5411-11e6-8e21-61a48070192a.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARK-16443

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14384


commit ecf918546d3d25b60ece78cad42ec41c7c188f3d
Author: Junyang Qian 
Date:   2016-07-08T21:56:31Z

ALS main added to mllib.R

commit 766b55b464263879d90a0cce6f0d3215f3dba1e3
Author: Junyang Qian 
Date:   2016-07-11T20:04:09Z

minimal wrapper

commit e4cd463486577d544ec096fe59a603b33374a11c
Author: Junyang Qian 
Date:   2016-07-21T22:03:25Z

clean comments

commit d0dfe7d21e99b8440de255a6a55f34971d08163f
Author: Junyang Qian 
Date:   2016-07-25T18:31:40Z

first set

commit 7d0c139e427f8f6fef4ed8f6380c966ad998da39
Author: Junyang Qian 
Date:   2016-07-25T18:56:22Z

add spark.als to namespace

commit 243defa8c5a95923cdfab4d306624150d01b6121
Author: Junyang Qian 
Date:   2016-07-27T04:51:27Z

ALS wrapper with summary, predict and write.ml

commit a3cca04c54596ea466cd98edc4ee5e4611f207df
Author: Junyang Qian 
Date:   2016-07-27T17:05:42Z

fix typo in reading model

commit d785b2a93e4e68661b661e08641bbb01dee8875f
Author: Junyang Qian 
Date:   2016-07-27T21:10:52Z

allow more arguments to als

commit 3939ccce37cd899edf007d0ac3b9589748db0bc3
Author: Junyang Qian 
Date:   2016-07-27T22:15:00Z

fix type issue in mllib.R




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org