Duplicates in Collaborative Filtering Output

Kartik Ohri Sun, 22 Jan 2023 23:43:34 -0800

Hi!

We are using Spark mllib (on Spark 3.2.0) ALS Model for an implicit
feedback based collaborative filtering recommendation job. While looking at
the output of recommendForUserSubset
<https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/recommendation/ALSModel.html#recommendForUserSubset(dataset:org.apache.spark.sql.Dataset[_],numItems:Int):org.apache.spark.sql.DataFrame>
, we found duplicate itemId, rating pairs. The documentation isn't clear on
whether duplicate pairs can appear in the recommendation outputs. We are
currently deduplicating the results as a post processing step. But I wanted
to confirm whether duplicates are expected here in the first place or
whether there's some issue with our usage that is causing buggy results?


Thanks in advance!

Regards.

Duplicates in Collaborative Filtering Output

Reply via email to