[GitHub] spark pull request: [SPARK-14891][ML] Add schema validation for AL...

MLnick Thu, 28 Apr 2016 12:38:44 -0700

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12762#discussion_r61489680
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala 
---
    @@ -53,24 +53,43 @@ import org.apache.spark.util.random.XORShiftRandom
      */
     private[recommendation] trait ALSModelParams extends Params with 
HasPredictionCol {
       /**
    -   * Param for the column name for user ids.
    +   * Param for the column name for user ids. Ids must be integers. Other
    +   * numeric types are supported for this column, but will be cast to 
integers as long as they
    +   * fall within the integer value range.
        * Default: "user"
        * @group param
        */
    -  val userCol = new Param[String](this, "userCol", "column name for user 
ids")
    +  val userCol = new Param[String](this, "userCol", "column name for user 
ids. Must be within " +
    +    "the integer value range.")
     
       /** @group getParam */
       def getUserCol: String = $(userCol)
     
       /**
    -   * Param for the column name for item ids.
    +   * Param for the column name for item ids. Ids must be integers. Other
    +   * numeric types are supported for this column, but will be cast to 
integers as long as they
    --- End diff --
    
    Ah ok. Could cast to double or float here... I was just concerned about any
    storage / performance impact, but if everything is pipelines through the
    cast -> udf then no problem
    On Thu, 28 Apr 2016 at 21:27, Holden Karau <notificati...@github.com> wrote:
    
    > In mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
    > <https://github.com/apache/spark/pull/12762#discussion_r61487974>:
    >
    > >
    > >    /** @group getParam */
    > >    def getUserCol: String = $(userCol)
    > >
    > >    /**
    > > -   * Param for the column name for item ids.
    > > +   * Param for the column name for item ids. Ids must be integers. 
Other
    > > +   * numeric types are supported for this column, but will be cast to 
integers as long as they
    >
    > Ah yes, I didn't notice the first cast from input type to Long - it seems
    > like that would be OK[ish] most of the time (except with floats/doubles),
    > but also with certain BigDecimal you could end up throwing away the high
    > bits when going to a Long and a very out of range value would pass the
    > range check.
    >
    > â
    > You are receiving this because you authored the thread.
    > Reply to this email directly or view it on GitHub
    > 
<https://github.com/apache/spark/pull/12762/files/73ea0b62f1c0ae6a9897ec83f5c8dfedea86f3f9#r61487974>
    >




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14891][ML] Add schema validation for AL...

Reply via email to