Dong Wang created SPARK-29828: --------------------------------- Summary: Missing persist on ratings on ratings in ml.recommendation.ALS.train Key: SPARK-29828 URL: https://issues.apache.org/jira/browse/SPARK-29828 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 2.4.3 Reporter: Dong Wang
There is a ratings.isEmpty() in ml.recommendation.ALS.train(). Actually, isEmpty() has an action operator. {code:scala} def isEmpty(): Boolean = withScope { partitions.length == 0 || take(1).length == 0 } {code} So rdd ratings will be used by multi actions, it should be persisted, and unpersisted after its child rdd has been persisted. {code:scala} def train[ID: ClassTag]( // scalastyle:ignore ratings: RDD[Rating[ID]], ... require(!ratings.isEmpty(), s"No ratings available from $ratings") // first use ratings ... val blockRatings = partitionRatings(ratings, userPart, itemPart) .persist(intermediateRDDStorageLevel) val (userInBlocks, userOutBlocks) = makeBlocks("user", blockRatings, userPart, itemPart, intermediateRDDStorageLevel) userOutBlocks.count() // materialize blockRatings and user blocks // ratings should be unpersisted here {code} This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org