Duplicates in Collaborative Filtering Output

2023-01-22 Thread Kartik Ohri
Hi!

We are using Spark mllib (on Spark 3.2.0) ALS Model for an implicit
feedback based collaborative filtering recommendation job. While looking at
the output of recommendForUserSubset

, we found duplicate itemId, rating pairs. The documentation isn't clear on
whether duplicate pairs can appear in the recommendation outputs. We are
currently deduplicating the results as a post processing step. But I wanted
to confirm whether duplicates are expected here in the first place or
whether there's some issue with our usage that is causing buggy results?

Thanks in advance!

Regards.


Re: Duplicates in Collaborative Filtering Output

2023-01-23 Thread Kartik Ohri
Hi again!

Ironically, soon after sending the previous email I actually found the bug
in our setup that was resulting in duplicates and it wasn't Mllib ALS after
all. Sorry for the confusion.

Regards.

On Mon, Jan 23, 2023 at 1:09 PM Kartik Ohri  wrote:

> Hi!
>
> We are using Spark mllib (on Spark 3.2.0) ALS Model for an implicit
> feedback based collaborative filtering recommendation job. While looking at
> the output of recommendForUserSubset
> 
> , we found duplicate itemId, rating pairs. The documentation isn't clear on
> whether duplicate pairs can appear in the recommendation outputs. We are
> currently deduplicating the results as a post processing step. But I wanted
> to confirm whether duplicates are expected here in the first place or
> whether there's some issue with our usage that is causing buggy results?
>
> Thanks in advance!
>
> Regards.
>