[ https://issues.apache.org/jira/browse/SPARK-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Pentreath resolved SPARK-14891. ------------------------------------ Resolution: Fixed Fix Version/s: 2.0.0 > ALS in ML never validates input schema > -------------------------------------- > > Key: SPARK-14891 > URL: https://issues.apache.org/jira/browse/SPARK-14891 > Project: Spark > Issue Type: Bug > Components: ML > Reporter: Nick Pentreath > Assignee: Nick Pentreath > Fix For: 2.0.0 > > > Currently, {{ALS.fit}} never validates the input schema. There is a > {{transformSchema}} impl that calls {{validateAndTransformSchema}}, but it is > never called in either {{ALS.fit}} or {{ALSModel.transform}}. > This was highlighted in SPARK-13857 (and failing PySpark tests > [here|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56849/consoleFull])when > adding a call to {{transformSchema}} in {{ALSModel.transform}} that actually > validates the input schema. The PySpark docstring tests result in Long inputs > by default, which fail validation as Int is required. > Currently, the inputs for user and item ids are cast to Int, with no input > type validation (or warning message). So users could pass in Long, Float, > Double, etc. It's also not made clear anywhere in the docs that only Int > types for user and item are supported. > Enforcing validation seems the best option but might break user code that > previously "just worked" especially in PySpark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org