Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12762#discussion_r61489680 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -53,24 +53,43 @@ import org.apache.spark.util.random.XORShiftRandom */ private[recommendation] trait ALSModelParams extends Params with HasPredictionCol { /** - * Param for the column name for user ids. + * Param for the column name for user ids. Ids must be integers. Other + * numeric types are supported for this column, but will be cast to integers as long as they + * fall within the integer value range. * Default: "user" * @group param */ - val userCol = new Param[String](this, "userCol", "column name for user ids") + val userCol = new Param[String](this, "userCol", "column name for user ids. Must be within " + + "the integer value range.") /** @group getParam */ def getUserCol: String = $(userCol) /** - * Param for the column name for item ids. + * Param for the column name for item ids. Ids must be integers. Other + * numeric types are supported for this column, but will be cast to integers as long as they --- End diff -- Ah ok. Could cast to double or float here... I was just concerned about any storage / performance impact, but if everything is pipelines through the cast -> udf then no problem On Thu, 28 Apr 2016 at 21:27, Holden Karau <notificati...@github.com> wrote: > In mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala > <https://github.com/apache/spark/pull/12762#discussion_r61487974>: > > > > > /** @group getParam */ > > def getUserCol: String = $(userCol) > > > > /** > > - * Param for the column name for item ids. > > + * Param for the column name for item ids. Ids must be integers. Other > > + * numeric types are supported for this column, but will be cast to integers as long as they > > Ah yes, I didn't notice the first cast from input type to Long - it seems > like that would be OK[ish] most of the time (except with floats/doubles), > but also with certain BigDecimal you could end up throwing away the high > bits when going to a Long and a very out of range value would pass the > range check. > > â > You are receiving this because you authored the thread. > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/12762/files/73ea0b62f1c0ae6a9897ec83f5c8dfedea86f3f9#r61487974> >
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org