[ 
https://issues.apache.org/jira/browse/SPARK-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley closed SPARK-10115.
-------------------------------------
    Resolution: Not A Problem

I'll close this since it's really a lack of support for Long IDs.

> MLlib ALS training fails with java.lang.ClassCastException
> ----------------------------------------------------------
>
>                 Key: SPARK-10115
>                 URL: https://issues.apache.org/jira/browse/SPARK-10115
>             Project: Spark
>          Issue Type: Bug
>         Environment: first experienced on spark 1.2.1 but then also with 
> latest
> spark-1.4.1-bin-hadoop2.6
>            Reporter: Michal Laclavik
>
> I am running ALS collaborative filtering training on data which looks as 
> follows (sample by running "user_product.take(10)"):
> {code}
> [(1205640308657491975, 50233468418, 1.0),
>  (4743366459073625989, 50233472294, 1.0),
>  (4743366459073625989, 50233473253, 1.0),
>  (4743366459073625989, 75586230246, 1.0),
>  (4743366459073625989, 50233473248, 1.0),
>  (56766162624422850, 74848929776, 1.0),
>  (56766162624422850, 50233473397, 1.0),
>  (56766162624422850, 78185852309, 1.0),
>  (56766162624422850, 73533710263, 1.0),
>  (56766162624422850, 78185852319, 1.0)]
> {code}
> and then I call training on that RDD:
> {code}
> rank = 12
> iterations=5
> model = ALS.train(user_product, rank, iterations)
> {code}
> and I get following error:
> {code}
> ---------------------------------------------------------------------------
> Py4JJavaError                             Traceback (most recent call last)
> <ipython-input-54-4e711b94952d> in <module>()
>       2 rank = 12
>       3 iterations=5
> ----> 4 model = ALS.train(user_product, rank, iterations)
> /opt/spark/python/pyspark/mllib/recommendation.py in train(cls, ratings, 
> rank, iterations, lambda_, blocks, nonnegative, seed)
>     192               seed=None):
>     193         model = callMLlibFunc("trainALSModel", cls._prepare(ratings), 
> rank, iterations,
> --> 194                               lambda_, blocks, nonnegative, seed)
>     195         return MatrixFactorizationModel(model)
>     196 
> /opt/spark/python/pyspark/mllib/common.py in callMLlibFunc(name, *args)
>     126     sc = SparkContext._active_spark_context
>     127     api = getattr(sc._jvm.PythonMLLibAPI(), name)
> --> 128     return callJavaFunc(sc, api, *args)
>     129 
>     130 
> /opt/spark/python/pyspark/mllib/common.py in callJavaFunc(sc, func, *args)
>     119     """ Call Java Function """
>     120     args = [_py2java(sc, a) for a in args]
> --> 121     return _java2py(sc, func(*args))
>     122 
>     123 
> /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in 
> __call__(self, *args)
>     536         answer = self.gateway_client.send_command(command)
>     537         return_value = get_return_value(answer, self.gateway_client,
> --> 538                 self.target_id, self.name)
>     539 
>     540         for temp_arg in temp_args:
> /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
>     298                 raise Py4JJavaError(
>     299                     'An error occurred while calling {0}{1}{2}.\n'.
> --> 300                     format(target_id, '.', name), value)
>     301             else:
>     302                 raise Py4JError(
> Py4JJavaError: An error occurred while calling o448.trainALSModel.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 
> in stage 57.0 failed 1 times, most recent failure: Lost task 9.0 in stage 
> 57.0 (TID 4187, localhost): java.lang.ClassCastException
> Driver stacktrace:
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>       at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>       at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>       at scala.Option.foreach(Option.scala:236)
>       at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to