[ https://issues.apache.org/jira/browse/SPARK-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley closed SPARK-10115. ------------------------------------- Resolution: Not A Problem I'll close this since it's really a lack of support for Long IDs. > MLlib ALS training fails with java.lang.ClassCastException > ---------------------------------------------------------- > > Key: SPARK-10115 > URL: https://issues.apache.org/jira/browse/SPARK-10115 > Project: Spark > Issue Type: Bug > Environment: first experienced on spark 1.2.1 but then also with > latest > spark-1.4.1-bin-hadoop2.6 > Reporter: Michal Laclavik > > I am running ALS collaborative filtering training on data which looks as > follows (sample by running "user_product.take(10)"): > {code} > [(1205640308657491975, 50233468418, 1.0), > (4743366459073625989, 50233472294, 1.0), > (4743366459073625989, 50233473253, 1.0), > (4743366459073625989, 75586230246, 1.0), > (4743366459073625989, 50233473248, 1.0), > (56766162624422850, 74848929776, 1.0), > (56766162624422850, 50233473397, 1.0), > (56766162624422850, 78185852309, 1.0), > (56766162624422850, 73533710263, 1.0), > (56766162624422850, 78185852319, 1.0)] > {code} > and then I call training on that RDD: > {code} > rank = 12 > iterations=5 > model = ALS.train(user_product, rank, iterations) > {code} > and I get following error: > {code} > --------------------------------------------------------------------------- > Py4JJavaError Traceback (most recent call last) > <ipython-input-54-4e711b94952d> in <module>() > 2 rank = 12 > 3 iterations=5 > ----> 4 model = ALS.train(user_product, rank, iterations) > /opt/spark/python/pyspark/mllib/recommendation.py in train(cls, ratings, > rank, iterations, lambda_, blocks, nonnegative, seed) > 192 seed=None): > 193 model = callMLlibFunc("trainALSModel", cls._prepare(ratings), > rank, iterations, > --> 194 lambda_, blocks, nonnegative, seed) > 195 return MatrixFactorizationModel(model) > 196 > /opt/spark/python/pyspark/mllib/common.py in callMLlibFunc(name, *args) > 126 sc = SparkContext._active_spark_context > 127 api = getattr(sc._jvm.PythonMLLibAPI(), name) > --> 128 return callJavaFunc(sc, api, *args) > 129 > 130 > /opt/spark/python/pyspark/mllib/common.py in callJavaFunc(sc, func, *args) > 119 """ Call Java Function """ > 120 args = [_py2java(sc, a) for a in args] > --> 121 return _java2py(sc, func(*args)) > 122 > 123 > /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 536 answer = self.gateway_client.send_command(command) > 537 return_value = get_return_value(answer, self.gateway_client, > --> 538 self.target_id, self.name) > 539 > 540 for temp_arg in temp_args: > /opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 298 raise Py4JJavaError( > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > 302 raise Py4JError( > Py4JJavaError: An error occurred while calling o448.trainALSModel. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 > in stage 57.0 failed 1 times, most recent failure: Lost task 9.0 in stage > 57.0 (TID 4187, localhost): java.lang.ClassCastException > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org