Hi Roberto

The method predictAll in PySpark calls the underlying method predict in
Scala, which takes an RDD of (userId, productId) pairs. In other words,
predictAll returns predicted scores for each pair in the input. This is
exactly what output you see (i.e. 3 predictions for user 1, and 1
prediction for the other users), as it matches your training data set.

Doc string:

        """
        Returns a list of predicted ratings for input user and product
pairs.
        """

By contrast recommendProducts returns the top-K predicted products for a
given user (which is why you see only 8 predictions, as there are only 8
product IDs).

Doc string:

        """
        Recommends the top "num" number of products for a given user and
returns a list
        of Rating objects sorted by the predicted rating in descending
order.
        """

Hope this helps.

On Fri, Dec 18, 2015 at 10:42 PM, Roberto Pagliari <
roberto.pagli...@asos.com> wrote:

> I created the following data, data.file
>
> 1 1
> 1 2
> 1 3
> 2 4
> 3 5
> 4 6
> 5 7
> 6 1
> 7 2
> 8 8
>
> The following code:
>
>     def parse_line(line):
>         tokens = line.split(' ')
>         return (int(tokens[0]), int(tokens[1])), 1.0
>
>     lines = sc.textFile(‘./data.file')
>     linesTest = sc.textFile(‘./data.file')
>
>     trainingRDD = lines.map(parse_line)\
>                        .map(lambda l: Rating(l[0][0], l[0][1], l[1]))
>
>     testRDD = linesTest.map(parse_line)\
>                        .map(lambda x: (x[0][0], x[0][1]))
>     rank = 5
>     numIterations = 5
>     model = ALS.trainImplicit(ratings=trainingRDD,
>                               rank=5,
>                               iterations=5)
>
>     res = model.predictAll(testRDD).collect()
>
>     for item in res: print item
>
> produces the following output:
>
>     Rating(user=4, product=6, rating=0.6767983278562415)
>     Rating(user=6, product=1, rating=0.620394043421327)
>     Rating(user=8, product=8, rating=0.43915435032205224)
>     Rating(user=2, product=4, rating=0.6712931344760976)
>     Rating(user=1, product=2, rating=1.058575470286403)
>     Rating(user=1, product=1, rating=1.0710334376535875)
>     Rating(user=1, product=3, rating=0.7958297361341067)
>     Rating(user=7, product=2, rating=0.6183187594872994)
>     Rating(user=3, product=5, rating=0.862203908436539)
>     Rating(user=5, product=7, rating=0.8487787055836538)
>
> By changing this line
>
>     res = model.predictAll(testRDD).collect()
>
> to that
>
>     res = model.recommendProducts(1, 10)
>
> The output is
>
>     Rating(user=1, product=2, rating=1.0664127057236918)
>     Rating(user=1, product=1, rating=1.054581213757793)
>     Rating(user=1, product=3, rating=0.7844128375421406)
>     Rating(user=1, product=6, rating=0.021054889001335786)
>     Rating(user=1, product=7, rating=0.0190815148087915)
>     Rating(user=1, product=8, rating=0.016932852980070745)
>     Rating(user=1, product=5, rating=0.005659639719215903)
>     Rating(user=1, product=4, rating=-0.007570583694108901)
>
> why is that most of these ratings do not show up when using predictAll?
>
>

Reply via email to