[ 
https://issues.apache.org/jira/browse/SPARK-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983644#comment-14983644
 ] 

Bryan Cutler edited comment on SPARK-10158 at 10/31/15 7:05 AM:
----------------------------------------------------------------

I think the best way to handle this from the PySpark side is to add something 
like the following to {{ALS._prepare}} 
([link|https://github.com/apache/spark/blob/master/python/pyspark/mllib/recommendation.py#L215])
 which is called before training

{noformat}
MAX_ID_VALUE = ratings.ctx._gateway.jvm.Integer.MAX_VALUE
if ratings.filter(lambda x: x.user > MAX_ID_VALUE or x.product > 
MAX_ID_VALUE).count() > 0:
  raise ValueError("Rating IDs must be less than max Java int %s." % 
str(MAX_ID_VALUE))
{noformat}

But any operations on the data are probably not worth the hit for this issue

Edit: I meant the above as an alternative to checking values for 2^31 
explicitly, which could be done in the Ratings constructor but seems like too 
much of a hack to me


was (Author: bryanc):
The only way I can see handling this from the PySpark side is to add something 
like the following to {{ALS._prepare}} 
([link|https://github.com/apache/spark/blob/master/python/pyspark/mllib/recommendation.py#L215])
 which is called before training

{noformat}
MAX_ID_VALUE = ratings.ctx._gateway.jvm.Integer.MAX_VALUE
if ratings.filter(lambda x: x.user > MAX_ID_VALUE or x.product > 
MAX_ID_VALUE).count() > 0:
  raise ValueError("Rating IDs must be less than max Java int %s." % 
str(MAX_ID_VALUE))
{noformat}

But any operations on the data are probably not worth the hit for this issue

> ALS should print better errors when given Long IDs
> --------------------------------------------------
>
>                 Key: SPARK-10158
>                 URL: https://issues.apache.org/jira/browse/SPARK-10158
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib, PySpark
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> See [SPARK-10115] for the very confusing messages you get when you try to use 
> ALS with Long IDs.  We should catch and identify these errors and print 
> meaningful error messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to