[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14384 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75440356 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,146 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' +#' @param data a SparkDataFrame for training. +#' @param ratingCol column name for ratings. +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers. +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers. +#' @param rank rank of the matrix factorization (> 0). +#' @param reg regularization parameter (>= 0). +#' @param maxIter maximum number of iterations (>= 0). +#' @param nonnegative logical value indicating whether to apply nonnegativity constraints. +#' @param implicitPrefs logical value indicating whether to use implicit preference. +#' @param alpha alpha parameter in the implicit preference formulation (>= 0). +#' @param seed integer seed for random number generation. +#' @param numUserBlocks number of user blocks used to parallelize computation (> 0). +#' @param numItemBlocks number of item blocks used to parallelize computation (> 0). +#' @param checkpointInterval number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame-method +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, nonnegative = FALSE, + implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, numItemBlocks = 10, --- End diff -- It is - you are correct, it just sort of more clear. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75408767 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,146 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' +#' @param data a SparkDataFrame for training. +#' @param ratingCol column name for ratings. +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers. +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers. +#' @param rank rank of the matrix factorization (> 0). +#' @param reg regularization parameter (>= 0). +#' @param maxIter maximum number of iterations (>= 0). +#' @param nonnegative logical value indicating whether to apply nonnegativity constraints. +#' @param implicitPrefs logical value indicating whether to use implicit preference. +#' @param alpha alpha parameter in the implicit preference formulation (>= 0). +#' @param seed integer seed for random number generation. +#' @param numUserBlocks number of user blocks used to parallelize computation (> 0). +#' @param numItemBlocks number of item blocks used to parallelize computation (> 0). +#' @param checkpointInterval number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame-method +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, nonnegative = FALSE, + implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, numItemBlocks = 10, --- End diff -- In fact it doesn't matter? I think R default is double type, but it's more clear to differentiate from other integer parameters. Done thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75290436 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,146 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' +#' @param data a SparkDataFrame for training. +#' @param ratingCol column name for ratings. +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers. +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers. +#' @param rank rank of the matrix factorization (> 0). +#' @param reg regularization parameter (>= 0). +#' @param maxIter maximum number of iterations (>= 0). +#' @param nonnegative logical value indicating whether to apply nonnegativity constraints. +#' @param implicitPrefs logical value indicating whether to use implicit preference. +#' @param alpha alpha parameter in the implicit preference formulation (>= 0). +#' @param seed integer seed for random number generation. +#' @param numUserBlocks number of user blocks used to parallelize computation (> 0). +#' @param numItemBlocks number of item blocks used to parallelize computation (> 0). +#' @param checkpointInterval number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame-method +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, nonnegative = FALSE, + implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, numItemBlocks = 10, --- End diff -- should `alpha = 1.0`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75273184 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,61 @@ test_that("spark.survreg", { } }) +test_that("spark.als", { + # R code to reproduce the result. + # + #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), + #' list(2, 1, 1.0), list(2, 2, 5.0)) + #' df <- createDataFrame(data, c("user", "item", "rating")) + #' model <- spark.als(df, ratingCol = "rating", userCol = "user", itemCol = "item", + #'rank = 10, maxIter = 5, seed = 0) + #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), c("user", "item")) + #' predict(model, test) + # + # -- output of 'predict(model, data)' + # + # user item prediction + # 0 2 -0.1380762 + # 1 0 2.6258414 + # 2 0 -1.5018409 --- End diff -- Yeah that makes sense. It seems `spark.survreg` and `spark.naiveBayes` also have such comment blocks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75272624 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, +val rRatingCol: String, +val rUserCol: String, +val rItemCol: String, +val rRegParam: Double, +val rMaxIter: Int) extends MLWritable { + + lazy val rUserFactors: DataFrame = alsm.userFactors + + lazy val rItemFactors: DataFrame = alsm.itemFactors + + lazy val rRank: Int = alsm.rank + + def transform(dataset: Dataset[_]): DataFrame = { +alsm.transform(dataset) + } + + override def write: MLWriter = new ALSWrapper.ALSWrapperWriter(this) +} + +private[r] object ALSWrapper extends MLReadable[ALSWrapper] { + + def fit(data: DataFrame, features: Array[String], rank: Int, regParam: Double, maxIter: Int, + implicitPrefs: Boolean, alpha: Double, nonnegative: Boolean, + distParams: Array[Int]): ALSWrapper = { --- End diff -- Yeah, it was intended to avoid style check. Have corrected it now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75272474 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, +val rRatingCol: String, --- End diff -- I followed the naming in `GeneralizedLinearRegressionWrapper`. It seems that we don't actually that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75233142 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, --- End diff -- Done. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75181798 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,61 @@ test_that("spark.survreg", { } }) +test_that("spark.als", { + # R code to reproduce the result. + # + #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), + #' list(2, 1, 1.0), list(2, 2, 5.0)) + #' df <- createDataFrame(data, c("user", "item", "rating")) + #' model <- spark.als(df, ratingCol = "rating", userCol = "user", itemCol = "item", + #'rank = 10, maxIter = 5, seed = 0) + #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), c("user", "item")) + #' predict(model, test) + # + # -- output of 'predict(model, data)' + # + # user item prediction + # 0 2 -0.1380762 + # 1 0 2.6258414 + # 2 0 -1.5018409 --- End diff -- It is not helpful if the code to produce the expect answer is the same as the test code. We can either use Scala code or simply remove this block of comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75179320 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, +val rRatingCol: String, +val rUserCol: String, +val rItemCol: String, +val rRegParam: Double, +val rMaxIter: Int) extends MLWritable { + + lazy val rUserFactors: DataFrame = alsm.userFactors + + lazy val rItemFactors: DataFrame = alsm.itemFactors + + lazy val rRank: Int = alsm.rank + + def transform(dataset: Dataset[_]): DataFrame = { +alsm.transform(dataset) + } + + override def write: MLWriter = new ALSWrapper.ALSWrapperWriter(this) +} + +private[r] object ALSWrapper extends MLReadable[ALSWrapper] { + + def fit(data: DataFrame, features: Array[String], rank: Int, regParam: Double, maxIter: Int, + implicitPrefs: Boolean, alpha: Double, nonnegative: Boolean, + distParams: Array[Int]): ALSWrapper = { --- End diff -- Why not using explicit params? If this is to avoid style check, you can turn it off by `// stylecheck:off` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75179030 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, +val rRatingCol: String, +val rUserCol: String, +val rItemCol: String, +val rRegParam: Double, +val rMaxIter: Int) extends MLWritable { + + lazy val rUserFactors: DataFrame = alsm.userFactors + + lazy val rItemFactors: DataFrame = alsm.itemFactors + + lazy val rRank: Int = alsm.rank + + def transform(dataset: Dataset[_]): DataFrame = { +alsm.transform(dataset) + } + + override def write: MLWriter = new ALSWrapper.ALSWrapperWriter(this) +} + +private[r] object ALSWrapper extends MLReadable[ALSWrapper] { + + def fit(data: DataFrame, features: Array[String], rank: Int, regParam: Double, maxIter: Int, --- End diff -- Chop down the arguments and use 4-space indentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75178881 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, +val rRatingCol: String, --- End diff -- Btw, can we get those metadata from the model object? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75178780 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, --- End diff -- `alsModel` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75178434 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/ALSWrapper.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.recommendation.{ALS, ALSModel} +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class ALSWrapper private ( +val alsm: ALSModel, +val rRatingCol: String, --- End diff -- Is it the same as `ratingCol`? Then why do we need the `r` prefix? cc: @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r75004906 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,61 @@ test_that("spark.survreg", { } }) +test_that("spark.als", { + # R code to reproduce the result. + # + #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), + #' list(2, 1, 1.0), list(2, 2, 5.0)) + #' df <- createDataFrame(data, c("user", "item", "rating")) + #' model <- spark.als(df, ratingCol = "rating", userCol = "user", itemCol = "item", + #'rank = 10, maxIter = 5, seed = 0) + #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), c("user", "item")) + #' predict(model, test) + # + # -- output of 'predict(model, data)' + # + # user item prediction --- End diff -- I think the usage exposed in this example has mostly been covered by the existing examples. Anything specific in mind? The algorithm does not guarantee non-negativeness unless specified in the arguments. A short answer would be a low predicted rating, if the ratings in the training data are all nonnegative. In fact, if no constraints put, the range of the predicted rating could be all real numbers. An alternative way is to use another function to map the value back to the desired region (e.g. 0-5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74999004 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} --- End diff -- Sounds good to me. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74868483 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) +#' @param ... additional named argument(s) such as \code{nonnegative}. +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +if (!is.numeric(rank) || rank <= 0) { + stop("rank should be a positive number.") +} +if (!is.numeric(reg) || reg < 0) { + stop("reg should be a nonnegative number.") +} +if (!is.numeric(maxIter) || maxIter <= 0) { + stop("maxIter should be a positive number.") +} + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, +
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74868437 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) +#' @param ... additional named argument(s) such as \code{nonnegative}. +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +if (!is.numeric(rank) || rank <= 0) { + stop("rank should be a positive number.") +} +if (!is.numeric(reg) || reg < 0) { + stop("reg should be a nonnegative number.") +} +if (!is.numeric(maxIter) || maxIter <= 0) { + stop("maxIter should be a positive number.") +} + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, +
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74867735 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) +#' @param ... additional named argument(s) such as \code{nonnegative}. +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +if (!is.numeric(rank) || rank <= 0) { + stop("rank should be a positive number.") +} +if (!is.numeric(reg) || reg < 0) { + stop("reg should be a nonnegative number.") +} +if (!is.numeric(maxIter) || maxIter <= 0) { + stop("maxIter should be a positive number.") +} + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, +
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74867708 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) +#' @param ... additional named argument(s) such as \code{nonnegative}. +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame --- End diff -- seems like this should be `#' @aliases spark.als,SparkDataFrame-method` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74674616 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} --- End diff -- I see. I think it's ok though. R ALS has 16 parameters - https://cran.r-project.org/web/packages/ALS/ALS.pdf Since most parameters have default values people would just omit them. I think it is better to be explicit because otherwise it is easier to make mistakes on types (eg. c() vs list()), typos and so on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74527111 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + --- End diff -- remove empty newline in roxygen2 block --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74528252 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +if (!is.numeric(rank) || rank <= 0) { + stop("rank should be a positive number.") +} +if (!is.numeric(reg) || reg < 0) { + stop("reg should be a nonnegative number.") +} +if (!is.numeric(maxIter) || maxIter <= 0) { + stop("maxIter should be a positive number.") +} + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, + checkpointInterval, seed))) + +
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74537542 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} --- End diff -- I was thinking those parameters might be not as important as the ones in the list, and don't want the user to feel there are so many variables to tune for the algorithm? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74551161 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,61 @@ test_that("spark.survreg", { } }) +test_that("spark.als", { + # R code to reproduce the result. + # + #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), + #' list(2, 1, 1.0), list(2, 2, 5.0)) + #' df <- createDataFrame(data, c("user", "item", "rating")) + #' model <- spark.als(df, ratingCol = "rating", userCol = "user", itemCol = "item", + #'rank = 10, maxIter = 5, seed = 0) + #' test <- createDataFrame(list(list(0, 2), list(1, 0), list(2, 0)), c("user", "item")) + #' predict(model, test) + # + # -- output of 'predict(model, data)' + # + # user item prediction --- End diff -- It would be good to have these examples also in method documentation. Also, how do you interpret negatively predicted rating ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74536489 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +if (!is.numeric(rank) || rank <= 0) { + stop("rank should be a positive number.") +} +if (!is.numeric(reg) || reg < 0) { + stop("reg should be a nonnegative number.") +} +if (!is.numeric(maxIter) || maxIter <= 0) { + stop("maxIter should be a positive number.") +} + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, + checkpointInterval, seed))) + +jobj
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74527784 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} --- End diff -- Is there a reason we are preferring `...` vs naming these out like `maxIter` in the function definition on L714? if it's well known it's probably better to name them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74527216 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) --- End diff -- please add documentation for `...` as for example `@param ... additional name arguments such as nonnegative` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74116056 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(ratings) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, + checkpointInterval, seed))) + +jobj <- callJStatic("org.apache.spark.ml.r.ALSWrapper", +"fit", data@sdf, features, as.integer(rank), +reg, as.integer(maxIter), implicitPrefs, alpha, nonnegative, --- End diff -- Done. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74112930 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(ratings) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +`%||%` <- function(a, b) if (!is.null(a)) a else b --- End diff -- In this case (since we set 7 default values) the code would be a little repetitive if we expand every one of them. I guess this definition would not have side effect to the other functions in SparkR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74009550 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(ratings) --- End diff -- Good point. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r73999377 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(ratings) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, + checkpointInterval, seed))) + +jobj <- callJStatic("org.apache.spark.ml.r.ALSWrapper", +"fit", data@sdf, features, as.integer(rank), +reg, as.integer(maxIter), implicitPrefs, alpha, nonnegative, --- End diff -- here, it doesn't check if `rank` or `maxIter` is positive. Maybe some checks on R-side would be good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r73999041 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(ratings) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +`%||%` <- function(a, b) if (!is.null(a)) a else b --- End diff -- Usually, in SparkR we haven't introduced such operators to solve this type of problems. Maybe it would be good to follow the convention... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r73998522 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,147 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(ratings) --- End diff -- @junyangq, I think that it would be good if you show in examples how you create the `ratings` data.frame. Does R have a build-in ratings dataset ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r72880604 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,48 @@ test_that("spark.survreg", { } }) +test_that("spark.als", { + # R code to reproduce the result. + # + #' data <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), --- End diff -- for code in comment, try ``` # nolint start # code # nolint end ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r72880581 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,137 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints} --- End diff -- could you add code example using these additional arguments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r72880583 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,137 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints} --- End diff -- could you add code example using these additional arguments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r72880568 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,137 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0)} +#'\item{seed}{seed for random number generation} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0)} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0)} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1)} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be cast into) integers --- End diff -- nit: cast is a foreign concept in R. perhaps `coerce`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r72880558 --- Diff: R/pkg/R/mllib.R --- @@ -61,7 +68,7 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @name write.ml #' @export #' @seealso \link{spark.glm}, \link{glm} -#' @seealso \link{spark.kmeans}, \link{spark.naiveBayes}, \link{spark.survreg} +#' @seealso \link{spark.kmeans}, \link{spark.naiveBayes}, \link{spark.survreg}, \link{spark.als} --- End diff -- add to front since this is "als"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14384 [Spark-16443][SparkR] Alternating Least Squares (ALS) wrapper ## What changes were proposed in this pull request? Add Alternating Least Squares wrapper in SparkR. Unit tests have been updated. ## How was this patch tested? SparkR unit tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) ![screen shot 2016-07-27 at 3 50 46 pm](https://cloud.githubusercontent.com/assets/15318264/17195348/f7a7d452-5411-11e6-845f-6d292283bc28.png) ![screen shot 2016-07-27 at 3 50 31 pm](https://cloud.githubusercontent.com/assets/15318264/17195347/f7a6352a-5411-11e6-8e21-61a48070192a.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16443 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14384.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14384 commit ecf918546d3d25b60ece78cad42ec41c7c188f3d Author: Junyang QianDate: 2016-07-08T21:56:31Z ALS main added to mllib.R commit 766b55b464263879d90a0cce6f0d3215f3dba1e3 Author: Junyang Qian Date: 2016-07-11T20:04:09Z minimal wrapper commit e4cd463486577d544ec096fe59a603b33374a11c Author: Junyang Qian Date: 2016-07-21T22:03:25Z clean comments commit d0dfe7d21e99b8440de255a6a55f34971d08163f Author: Junyang Qian Date: 2016-07-25T18:31:40Z first set commit 7d0c139e427f8f6fef4ed8f6380c966ad998da39 Author: Junyang Qian Date: 2016-07-25T18:56:22Z add spark.als to namespace commit 243defa8c5a95923cdfab4d306624150d01b6121 Author: Junyang Qian Date: 2016-07-27T04:51:27Z ALS wrapper with summary, predict and write.ml commit a3cca04c54596ea466cd98edc4ee5e4611f207df Author: Junyang Qian Date: 2016-07-27T17:05:42Z fix typo in reading model commit d785b2a93e4e68661b661e08641bbb01dee8875f Author: Junyang Qian Date: 2016-07-27T21:10:52Z allow more arguments to als commit 3939ccce37cd899edf007d0ac3b9589748db0bc3 Author: Junyang Qian Date: 2016-07-27T22:15:00Z fix type issue in mllib.R --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org