Hi again! Ironically, soon after sending the previous email I actually found the bug in our setup that was resulting in duplicates and it wasn't Mllib ALS after all. Sorry for the confusion.
Regards. On Mon, Jan 23, 2023 at 1:09 PM Kartik Ohri <kartikohr...@gmail.com> wrote: > Hi! > > We are using Spark mllib (on Spark 3.2.0) ALS Model for an implicit > feedback based collaborative filtering recommendation job. While looking at > the output of recommendForUserSubset > <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/recommendation/ALSModel.html#recommendForUserSubset(dataset:org.apache.spark.sql.Dataset[_],numItems:Int):org.apache.spark.sql.DataFrame> > , we found duplicate itemId, rating pairs. The documentation isn't clear on > whether duplicate pairs can appear in the recommendation outputs. We are > currently deduplicating the results as a post processing step. But I wanted > to confirm whether duplicates are expected here in the first place or > whether there's some issue with our usage that is causing buggy results? > > Thanks in advance! > > Regards. >