Re: Matrix Multiplication and mllib.recommendation

2015-06-28 Thread Ilya Ganelin
Ayman - it's really a question of recommending user to products vs products
to users. There will only be a difference if you're not doing All to All.
For example, if you're recommending only the Top N recommendations. Then
you may recommend only the top N products or the top N users which would be
different.
On Sun, Jun 28, 2015 at 8:34 AM Ayman Farahat 
wrote:

> Thanks Ilya
> Is there an advantage of say partitioning by users /products when you
> train ?
> Here are two alternatives I have
>
> #Partition by user or Product
> tot = newrdd.map(lambda l:
> (l[1],Rating(int(l[1]),int(l[2]),l[4]))).partitionBy(50).cache()
> ratings = tot.values()
> model = ALS.train(ratings, rank, numIterations)
>
> #use zipwithIndex
>
> tot = newrdd.map(lambda l: (l[1],Rating(int(l[1]),int(l[2]),l[4])))
> bob = tot.zipWithIndex().map(lambda x : (x[1] ,x[0])).partitionBy(30)
> ratings = bob.values()
> model = ALS.train(ratings, rank, numIterations)
>
>
> On Jun 28, 2015, at 8:24 AM, Ilya Ganelin  wrote:
>
> You can also select pieces of your RDD by first doing a zipWithIndex and
> then doing a filter operation on the second element of the RDD.
>
> For example to select the first 100 elements :
>
> Val a = rdd.zipWithIndex().filter(s => 1 < s < 100)
> On Sat, Jun 27, 2015 at 11:04 AM Ayman Farahat <
> ayman.fara...@yahoo.com.invalid> wrote:
>
>> How do you partition by product in Python?
>> the only API is partitionBy(50)
>>
>> On Jun 18, 2015, at 8:42 AM, Debasish Das 
>> wrote:
>>
>> Also in my experiments, it's much faster to blocked BLAS through
>> cartesian rather than doing sc.union. Here are the details on the
>> experiments:
>>
>> https://issues.apache.org/jira/browse/SPARK-4823
>>
>> On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das 
>> wrote:
>>
>>> Also not sure how threading helps here because Spark puts a partition to
>>> each core. On each core may be there are multiple threads if you are using
>>> intel hyperthreading but I will let Spark handle the threading.
>>>
>>> On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das 
>>> wrote:
>>>
 We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
 dgemm based calculation.

 On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
 ayman.fara...@yahoo.com.invalid> wrote:

> Thanks Sabarish and Nick
> Would you happen to have some code snippets that you can share.
> Best
> Ayman
>
> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
> Nick is right. I too have implemented this way and it works just fine.
> In my case, there can be even more products. You simply broadcast blocks 
> of
> products to userFeatures.mapPartitions() and BLAS multiply in there to get
> recommendations. In my case 10K products form one block. Note that you
> would then have to union your recommendations. And if there lots of 
> product
> blocks, you might also want to checkpoint once every few times.
>
> Regards
> Sab
>
> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <
> nick.pentre...@gmail.com> wrote:
>
>> One issue is that you broadcast the product vectors and then do a dot
>> product one-by-one with the user vector.
>>
>> You should try forming a matrix of the item vectors and doing the dot
>> product as a matrix-vector multiply which will make things a lot faster.
>>
>> Another optimisation that is avalailable on 1.4 is a
>> recommendProducts method that blockifies the factors to make use of 
>> level 3
>> BLAS (ie matrix-matrix multiply). I am not sure if this is available in 
>> The
>> Python api yet.
>>
>> But you can do a version yourself by using mapPartitions over user
>> factors, blocking the factors into sub-matrices and doing matrix multiply
>> with item factor matrix to get scores on a block-by-block basis.
>>
>> Also as Ilya says more parallelism can help. I don't think it's so
>> necessary to do LSH with 30,000 items.
>>
>> —
>> Sent from Mailbox 
>>
>>
>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>> ilya.gane...@capitalone.com> wrote:
>>
>>> Actually talk about this exact thing in a blog post here
>>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>> Keep in mind, you're actually doing a ton of math. Even with proper 
>>> caching
>>> and use of broadcast variables this will take a while defending on the 
>>> size
>>> of your cluster. To get real results you may want to look into locality
>>> sensitive hashing to limit your search space and definitely look into
>>> spinning up multiple threads to process your product features in 
>>> parallel
>>> to increase resource utilization on the cluster.
>>>
>>>
>>>
>>> Thank you,
>>> Ilya Ganelin
>

Re: Matrix Multiplication and mllib.recommendation

2015-06-28 Thread Ayman Farahat
Thanks Ilya
Is there an advantage of say partitioning by users /products when you train ?
Here are two alternatives I have 

#Partition by user or Product 
tot = newrdd.map(lambda l: 
(l[1],Rating(int(l[1]),int(l[2]),l[4]))).partitionBy(50).cache()
ratings = tot.values()
model = ALS.train(ratings, rank, numIterations)

#use zipwithIndex

tot = newrdd.map(lambda l: (l[1],Rating(int(l[1]),int(l[2]),l[4])))
bob = tot.zipWithIndex().map(lambda x : (x[1] ,x[0])).partitionBy(30)
ratings = bob.values()
model = ALS.train(ratings, rank, numIterations)


On Jun 28, 2015, at 8:24 AM, Ilya Ganelin  wrote:

> You can also select pieces of your RDD by first doing a zipWithIndex and then 
> doing a filter operation on the second element of the RDD. 
> 
> For example to select the first 100 elements :
> 
> Val a = rdd.zipWithIndex().filter(s => 1 < s < 100)
> On Sat, Jun 27, 2015 at 11:04 AM Ayman Farahat 
>  wrote:
> How do you partition by product in Python?
> the only API is partitionBy(50)
> 
> On Jun 18, 2015, at 8:42 AM, Debasish Das  wrote:
> 
>> Also in my experiments, it's much faster to blocked BLAS through cartesian 
>> rather than doing sc.union. Here are the details on the experiments:
>> 
>> https://issues.apache.org/jira/browse/SPARK-4823
>> 
>> On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das  
>> wrote:
>> Also not sure how threading helps here because Spark puts a partition to 
>> each core. On each core may be there are multiple threads if you are using 
>> intel hyperthreading but I will let Spark handle the threading.  
>> 
>> On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das  
>> wrote:
>> We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS 
>> dgemm based calculation.
>> 
>> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat 
>>  wrote:
>> Thanks Sabarish and Nick
>> Would you happen to have some code snippets that you can share. 
>> Best
>> Ayman
>> 
>> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan 
>>  wrote:
>> 
>>> Nick is right. I too have implemented this way and it works just fine. In 
>>> my case, there can be even more products. You simply broadcast blocks of 
>>> products to userFeatures.mapPartitions() and BLAS multiply in there to get 
>>> recommendations. In my case 10K products form one block. Note that you 
>>> would then have to union your recommendations. And if there lots of product 
>>> blocks, you might also want to checkpoint once every few times.
>>> 
>>> Regards
>>> Sab
>>> 
>>> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath  
>>> wrote:
>>> One issue is that you broadcast the product vectors and then do a dot 
>>> product one-by-one with the user vector.
>>> 
>>> You should try forming a matrix of the item vectors and doing the dot 
>>> product as a matrix-vector multiply which will make things a lot faster.
>>> 
>>> Another optimisation that is avalailable on 1.4 is a recommendProducts 
>>> method that blockifies the factors to make use of level 3 BLAS (ie 
>>> matrix-matrix multiply). I am not sure if this is available in The Python 
>>> api yet. 
>>> 
>>> But you can do a version yourself by using mapPartitions over user factors, 
>>> blocking the factors into sub-matrices and doing matrix multiply with item 
>>> factor matrix to get scores on a block-by-block basis.
>>> 
>>> Also as Ilya says more parallelism can help. I don't think it's so 
>>> necessary to do LSH with 30,000 items.
>>> 
>>> —
>>> Sent from Mailbox
>>> 
>>> 
>>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya 
>>>  wrote:
>>> 
>>> Actually talk about this exact thing in a blog post here 
>>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>>  Keep in mind, you're actually doing a ton of math. Even with proper 
>>> caching and use of broadcast variables this will take a while defending on 
>>> the size of your cluster. To get real results you may want to look into 
>>> locality sensitive hashing to limit your search space and definitely look 
>>> into spinning up multiple threads to process your product features in 
>>> parallel to increase resource utilization on the cluster.
>>> 
>>> 
>>> 
>>> Thank you,
>>> Ilya Ganelin
>>> 
>>> 
>>> 
>>> -Original Message-
>>> From: afarahat [ayman.fara...@yahoo.com]
>>> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>>> To: user@spark.apache.org
>>> Subject: Matrix Multiplication and mllib.recommendation
>>> 
>>> Hello;
>>> I am trying to get predictions after running the ALS model.
>>> The model works fine. In the prediction/recommendation , I have about 30
>>> ,000 products and 90 Millions users.
>>> When i try the predict all it fails.
>>> I have been trying to formulate the problem as a Matrix multiplication where
>>> I first get the product features, broadcast them and then do a dot product.
>>> Its still very slow. Any reason why
>>> here is a sample code
>>> 
>>> def doMultiply(x):
>>> a = []
>>> #multiply by
>>> mylen = len(pf

Re: Matrix Multiplication and mllib.recommendation

2015-06-28 Thread Ilya Ganelin
Oops - code should be :

Val a = rdd.zipWithIndex().filter(s => 1 < s._2 < 100)

On Sun, Jun 28, 2015 at 8:24 AM Ilya Ganelin  wrote:

> You can also select pieces of your RDD by first doing a zipWithIndex and
> then doing a filter operation on the second element of the RDD.
>
> For example to select the first 100 elements :
>
> Val a = rdd.zipWithIndex().filter(s => 1 < s < 100)
> On Sat, Jun 27, 2015 at 11:04 AM Ayman Farahat
>  wrote:
>
>> How do you partition by product in Python?
>> the only API is partitionBy(50)
>>
>> On Jun 18, 2015, at 8:42 AM, Debasish Das 
>> wrote:
>>
>> Also in my experiments, it's much faster to blocked BLAS through
>> cartesian rather than doing sc.union. Here are the details on the
>> experiments:
>>
>> https://issues.apache.org/jira/browse/SPARK-4823
>>
>> On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das 
>> wrote:
>>
>>> Also not sure how threading helps here because Spark puts a partition to
>>> each core. On each core may be there are multiple threads if you are using
>>> intel hyperthreading but I will let Spark handle the threading.
>>>
>>> On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das 
>>> wrote:
>>>
 We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
 dgemm based calculation.

 On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
 ayman.fara...@yahoo.com.invalid> wrote:

> Thanks Sabarish and Nick
> Would you happen to have some code snippets that you can share.
> Best
> Ayman
>
> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
> Nick is right. I too have implemented this way and it works just fine.
> In my case, there can be even more products. You simply broadcast blocks 
> of
> products to userFeatures.mapPartitions() and BLAS multiply in there to get
> recommendations. In my case 10K products form one block. Note that you
> would then have to union your recommendations. And if there lots of 
> product
> blocks, you might also want to checkpoint once every few times.
>
> Regards
> Sab
>
> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <
> nick.pentre...@gmail.com> wrote:
>
>> One issue is that you broadcast the product vectors and then do a dot
>> product one-by-one with the user vector.
>>
>> You should try forming a matrix of the item vectors and doing the dot
>> product as a matrix-vector multiply which will make things a lot faster.
>>
>> Another optimisation that is avalailable on 1.4 is a
>> recommendProducts method that blockifies the factors to make use of 
>> level 3
>> BLAS (ie matrix-matrix multiply). I am not sure if this is available in 
>> The
>> Python api yet.
>>
>> But you can do a version yourself by using mapPartitions over user
>> factors, blocking the factors into sub-matrices and doing matrix multiply
>> with item factor matrix to get scores on a block-by-block basis.
>>
>> Also as Ilya says more parallelism can help. I don't think it's so
>> necessary to do LSH with 30,000 items.
>>
>> —
>> Sent from Mailbox 
>>
>>
>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>> ilya.gane...@capitalone.com> wrote:
>>
>>> Actually talk about this exact thing in a blog post here
>>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>> Keep in mind, you're actually doing a ton of math. Even with proper 
>>> caching
>>> and use of broadcast variables this will take a while defending on the 
>>> size
>>> of your cluster. To get real results you may want to look into locality
>>> sensitive hashing to limit your search space and definitely look into
>>> spinning up multiple threads to process your product features in 
>>> parallel
>>> to increase resource utilization on the cluster.
>>>
>>>
>>>
>>> Thank you,
>>> Ilya Ganelin
>>>
>>>
>>>
>>> -Original Message-
>>> *From: *afarahat [ayman.fara...@yahoo.com]
>>> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>>> *To: *user@spark.apache.org
>>> *Subject: *Matrix Multiplication and mllib.recommendation
>>>
>>> Hello;
>>> I am trying to get predictions after running the ALS model.
>>> The model works fine. In the prediction/recommendation , I have
>>> about 30
>>> ,000 products and 90 Millions users.
>>> When i try the predict all it fails.
>>> I have been trying to formulate the problem as a Matrix
>>> multiplication where
>>> I first get the product features, broadcast them and then do a dot
>>> product.
>>> Its still very slow. Any reason why
>>> here is a sample code
>>>
>>> def doMultiply(x):
>>> a = []
>>> #multiply by

Re: Matrix Multiplication and mllib.recommendation

2015-06-28 Thread Ilya Ganelin
You can also select pieces of your RDD by first doing a zipWithIndex and
then doing a filter operation on the second element of the RDD.

For example to select the first 100 elements :

Val a = rdd.zipWithIndex().filter(s => 1 < s < 100)
On Sat, Jun 27, 2015 at 11:04 AM Ayman Farahat
 wrote:

> How do you partition by product in Python?
> the only API is partitionBy(50)
>
> On Jun 18, 2015, at 8:42 AM, Debasish Das 
> wrote:
>
> Also in my experiments, it's much faster to blocked BLAS through cartesian
> rather than doing sc.union. Here are the details on the experiments:
>
> https://issues.apache.org/jira/browse/SPARK-4823
>
> On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das 
> wrote:
>
>> Also not sure how threading helps here because Spark puts a partition to
>> each core. On each core may be there are multiple threads if you are using
>> intel hyperthreading but I will let Spark handle the threading.
>>
>> On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das 
>> wrote:
>>
>>> We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
>>> dgemm based calculation.
>>>
>>> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
>>> ayman.fara...@yahoo.com.invalid> wrote:
>>>
 Thanks Sabarish and Nick
 Would you happen to have some code snippets that you can share.
 Best
 Ayman

 On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
 sabarish.sasidha...@manthan.com> wrote:

 Nick is right. I too have implemented this way and it works just fine.
 In my case, there can be even more products. You simply broadcast blocks of
 products to userFeatures.mapPartitions() and BLAS multiply in there to get
 recommendations. In my case 10K products form one block. Note that you
 would then have to union your recommendations. And if there lots of product
 blocks, you might also want to checkpoint once every few times.

 Regards
 Sab

 On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <
 nick.pentre...@gmail.com> wrote:

> One issue is that you broadcast the product vectors and then do a dot
> product one-by-one with the user vector.
>
> You should try forming a matrix of the item vectors and doing the dot
> product as a matrix-vector multiply which will make things a lot faster.
>
> Another optimisation that is avalailable on 1.4 is a recommendProducts
> method that blockifies the factors to make use of level 3 BLAS (ie
> matrix-matrix multiply). I am not sure if this is available in The Python
> api yet.
>
> But you can do a version yourself by using mapPartitions over user
> factors, blocking the factors into sub-matrices and doing matrix multiply
> with item factor matrix to get scores on a block-by-block basis.
>
> Also as Ilya says more parallelism can help. I don't think it's so
> necessary to do LSH with 30,000 items.
>
> —
> Sent from Mailbox 
>
>
> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
> ilya.gane...@capitalone.com> wrote:
>
>> Actually talk about this exact thing in a blog post here
>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>> Keep in mind, you're actually doing a ton of math. Even with proper 
>> caching
>> and use of broadcast variables this will take a while defending on the 
>> size
>> of your cluster. To get real results you may want to look into locality
>> sensitive hashing to limit your search space and definitely look into
>> spinning up multiple threads to process your product features in parallel
>> to increase resource utilization on the cluster.
>>
>>
>>
>> Thank you,
>> Ilya Ganelin
>>
>>
>>
>> -Original Message-
>> *From: *afarahat [ayman.fara...@yahoo.com]
>> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>> *To: *user@spark.apache.org
>> *Subject: *Matrix Multiplication and mllib.recommendation
>>
>> Hello;
>> I am trying to get predictions after running the ALS model.
>> The model works fine. In the prediction/recommendation , I have about
>> 30
>> ,000 products and 90 Millions users.
>> When i try the predict all it fails.
>> I have been trying to formulate the problem as a Matrix
>> multiplication where
>> I first get the product features, broadcast them and then do a dot
>> product.
>> Its still very slow. Any reason why
>> here is a sample code
>>
>> def doMultiply(x):
>> a = []
>> #multiply by
>> mylen = len(pf.value)
>> for i in range(mylen) :
>>   myprod = numpy.dot(x,pf.value[i][1])
>>   a.append(myprod)
>> return a
>>
>>
>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>> #I need to select which pr

Re: Matrix Multiplication and mllib.recommendation

2015-06-27 Thread Ayman Farahat
How do you partition by product in Python?
the only API is partitionBy(50)

On Jun 18, 2015, at 8:42 AM, Debasish Das  wrote:

> Also in my experiments, it's much faster to blocked BLAS through cartesian 
> rather than doing sc.union. Here are the details on the experiments:
> 
> https://issues.apache.org/jira/browse/SPARK-4823
> 
> On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das  
> wrote:
> Also not sure how threading helps here because Spark puts a partition to each 
> core. On each core may be there are multiple threads if you are using intel 
> hyperthreading but I will let Spark handle the threading.  
> 
> On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das  
> wrote:
> We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS dgemm 
> based calculation.
> 
> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat 
>  wrote:
> Thanks Sabarish and Nick
> Would you happen to have some code snippets that you can share. 
> Best
> Ayman
> 
> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan 
>  wrote:
> 
>> Nick is right. I too have implemented this way and it works just fine. In my 
>> case, there can be even more products. You simply broadcast blocks of 
>> products to userFeatures.mapPartitions() and BLAS multiply in there to get 
>> recommendations. In my case 10K products form one block. Note that you would 
>> then have to union your recommendations. And if there lots of product 
>> blocks, you might also want to checkpoint once every few times.
>> 
>> Regards
>> Sab
>> 
>> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath  
>> wrote:
>> One issue is that you broadcast the product vectors and then do a dot 
>> product one-by-one with the user vector.
>> 
>> You should try forming a matrix of the item vectors and doing the dot 
>> product as a matrix-vector multiply which will make things a lot faster.
>> 
>> Another optimisation that is avalailable on 1.4 is a recommendProducts 
>> method that blockifies the factors to make use of level 3 BLAS (ie 
>> matrix-matrix multiply). I am not sure if this is available in The Python 
>> api yet. 
>> 
>> But you can do a version yourself by using mapPartitions over user factors, 
>> blocking the factors into sub-matrices and doing matrix multiply with item 
>> factor matrix to get scores on a block-by-block basis.
>> 
>> Also as Ilya says more parallelism can help. I don't think it's so necessary 
>> to do LSH with 30,000 items.
>> 
>> —
>> Sent from Mailbox
>> 
>> 
>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya  
>> wrote:
>> 
>> Actually talk about this exact thing in a blog post here 
>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>  Keep in mind, you're actually doing a ton of math. Even with proper caching 
>> and use of broadcast variables this will take a while defending on the size 
>> of your cluster. To get real results you may want to look into locality 
>> sensitive hashing to limit your search space and definitely look into 
>> spinning up multiple threads to process your product features in parallel to 
>> increase resource utilization on the cluster.
>> 
>> 
>> 
>> Thank you,
>> Ilya Ganelin
>> 
>> 
>> 
>> -Original Message-
>> From: afarahat [ayman.fara...@yahoo.com]
>> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>> To: user@spark.apache.org
>> Subject: Matrix Multiplication and mllib.recommendation
>> 
>> Hello;
>> I am trying to get predictions after running the ALS model.
>> The model works fine. In the prediction/recommendation , I have about 30
>> ,000 products and 90 Millions users.
>> When i try the predict all it fails.
>> I have been trying to formulate the problem as a Matrix multiplication where
>> I first get the product features, broadcast them and then do a dot product.
>> Its still very slow. Any reason why
>> here is a sample code
>> 
>> def doMultiply(x):
>> a = []
>> #multiply by
>> mylen = len(pf.value)
>> for i in range(mylen) :
>>   myprod = numpy.dot(x,pf.value[i][1])
>>   a.append(myprod)
>> return a
>> 
>> 
>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>> #I need to select which products to broadcast but lets try all
>> m1 = myModel.productFeatures().sample(False, 0.001)
>> pf = sc.broadcast(m1.collect())
>> uf = myModel.userFeatures()
>> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
>> 
>> 
>> The information contained in this e-mail is confidential and/or proprietary 
>> to Capital One and/or its af

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Nick Pentreath
Yup, numpy calls into BLAS for matrix multiply.

Sent from my iPad

> On 18 Jun 2015, at 8:54 PM, Ayman Farahat  wrote:
> 
> Thanks all for the help. 
> It turned out that using the bumpy matrix multiplication made a huge 
> difference in performance. I suspect that Numpy already uses BLAS optimized 
> code. 
> 
> Here is Python code
> 
> #This is where i load and directly test the predictions
> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
> m1 = myModel.productFeatures().sample(False, 1.00)
> m2 = m1.map(lambda (user,feature) : feature).collect()
> m3 = matrix(m2).transpose()
> 
> pf = sc.broadcast(m3)
> uf = myModel.userFeatures()
> 
> f1 = uf.map(lambda (userID, features): (userID, 
> squeeze(asarray(matrix(array(features)) * pf.value
> dog = f1.count()
> 
>> On Jun 18, 2015, at 8:42 AM, Debasish Das  wrote:
>> 
>> Also in my experiments, it's much faster to blocked BLAS through cartesian 
>> rather than doing sc.union. Here are the details on the experiments:
>> 
>> https://issues.apache.org/jira/browse/SPARK-4823
>> 
>>> On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das  
>>> wrote:
>>> Also not sure how threading helps here because Spark puts a partition to 
>>> each core. On each core may be there are multiple threads if you are using 
>>> intel hyperthreading but I will let Spark handle the threading.  
>>> 
 On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das  
 wrote:
 We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS 
 dgemm based calculation.
 
> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat 
>  wrote:
 
> Thanks Sabarish and Nick
> Would you happen to have some code snippets that you can share. 
> Best
> Ayman
> 
>> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan 
>>  wrote:
>> 
>> Nick is right. I too have implemented this way and it works just fine. 
>> In my case, there can be even more products. You simply broadcast blocks 
>> of products to userFeatures.mapPartitions() and BLAS multiply in there 
>> to get recommendations. In my case 10K products form one block. Note 
>> that you would then have to union your recommendations. And if there 
>> lots of product blocks, you might also want to checkpoint once every few 
>> times.
>> 
>> Regards
>> Sab
>> 
>>> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath 
>>>  wrote:
>>> One issue is that you broadcast the product vectors and then do a dot 
>>> product one-by-one with the user vector.
>>> 
>>> You should try forming a matrix of the item vectors and doing the dot 
>>> product as a matrix-vector multiply which will make things a lot faster.
>>> 
>>> Another optimisation that is avalailable on 1.4 is a recommendProducts 
>>> method that blockifies the factors to make use of level 3 BLAS (ie 
>>> matrix-matrix multiply). I am not sure if this is available in The 
>>> Python api yet. 
>>> 
>>> But you can do a version yourself by using mapPartitions over user 
>>> factors, blocking the factors into sub-matrices and doing matrix 
>>> multiply with item factor matrix to get scores on a block-by-block 
>>> basis.
>>> 
>>> Also as Ilya says more parallelism can help. I don't think it's so 
>>> necessary to do LSH with 30,000 items.
>>> 
>>> —
>>> Sent from Mailbox
>>> 
>>> 
 On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya 
  wrote:
 Actually talk about this exact thing in a blog post here 
 http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
  Keep in mind, you're actually doing a ton of math. Even with proper 
 caching and use of broadcast variables this will take a while 
 defending on the size of your cluster. To get real results you may 
 want to look into locality sensitive hashing to limit your search 
 space and definitely look into spinning up multiple threads to process 
 your product features in parallel to increase resource utilization on 
 the cluster.
 
 
 
 Thank you,
 Ilya Ganelin
 
 
 
 -Original Message-
 From: afarahat [ayman.fara...@yahoo.com]
 Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
 To: user@spark.apache.org
 Subject: Matrix Multiplication and mllib.recommendation
 
 Hello;
 I am trying to get predictions after running the ALS model.
 The model works fine. In the prediction/recommendation , I have about 
 30
 ,000 products and 90 Millions users.
 When i try the predict all it fails.
 I have been trying to formulate the problem as a Matrix multiplication 
 where
 I first get the product features, broadcast them and then do a d

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Ayman Farahat
Thanks all for the help. 
It turned out that using the bumpy matrix multiplication made a huge difference 
in performance. I suspect that Numpy already uses BLAS optimized code. 

Here is Python code

#This is where i load and directly test the predictions
myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
m1 = myModel.productFeatures().sample(False, 1.00)
m2 = m1.map(lambda (user,feature) : feature).collect()
m3 = matrix(m2).transpose()

pf = sc.broadcast(m3)
uf = myModel.userFeatures()

f1 = uf.map(lambda (userID, features): (userID, 
squeeze(asarray(matrix(array(features)) * pf.value
dog = f1.count()

On Jun 18, 2015, at 8:42 AM, Debasish Das  wrote:

> Also in my experiments, it's much faster to blocked BLAS through cartesian 
> rather than doing sc.union. Here are the details on the experiments:
> 
> https://issues.apache.org/jira/browse/SPARK-4823
> 
> On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das  
> wrote:
> Also not sure how threading helps here because Spark puts a partition to each 
> core. On each core may be there are multiple threads if you are using intel 
> hyperthreading but I will let Spark handle the threading.  
> 
> On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das  
> wrote:
> We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS dgemm 
> based calculation.
> 
> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat 
>  wrote:
> Thanks Sabarish and Nick
> Would you happen to have some code snippets that you can share. 
> Best
> Ayman
> 
> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan 
>  wrote:
> 
>> Nick is right. I too have implemented this way and it works just fine. In my 
>> case, there can be even more products. You simply broadcast blocks of 
>> products to userFeatures.mapPartitions() and BLAS multiply in there to get 
>> recommendations. In my case 10K products form one block. Note that you would 
>> then have to union your recommendations. And if there lots of product 
>> blocks, you might also want to checkpoint once every few times.
>> 
>> Regards
>> Sab
>> 
>> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath  
>> wrote:
>> One issue is that you broadcast the product vectors and then do a dot 
>> product one-by-one with the user vector.
>> 
>> You should try forming a matrix of the item vectors and doing the dot 
>> product as a matrix-vector multiply which will make things a lot faster.
>> 
>> Another optimisation that is avalailable on 1.4 is a recommendProducts 
>> method that blockifies the factors to make use of level 3 BLAS (ie 
>> matrix-matrix multiply). I am not sure if this is available in The Python 
>> api yet. 
>> 
>> But you can do a version yourself by using mapPartitions over user factors, 
>> blocking the factors into sub-matrices and doing matrix multiply with item 
>> factor matrix to get scores on a block-by-block basis.
>> 
>> Also as Ilya says more parallelism can help. I don't think it's so necessary 
>> to do LSH with 30,000 items.
>> 
>> —
>> Sent from Mailbox
>> 
>> 
>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya  
>> wrote:
>> 
>> Actually talk about this exact thing in a blog post here 
>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>  Keep in mind, you're actually doing a ton of math. Even with proper caching 
>> and use of broadcast variables this will take a while defending on the size 
>> of your cluster. To get real results you may want to look into locality 
>> sensitive hashing to limit your search space and definitely look into 
>> spinning up multiple threads to process your product features in parallel to 
>> increase resource utilization on the cluster.
>> 
>> 
>> 
>> Thank you,
>> Ilya Ganelin
>> 
>> 
>> 
>> -Original Message-
>> From: afarahat [ayman.fara...@yahoo.com]
>> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>> To: user@spark.apache.org
>> Subject: Matrix Multiplication and mllib.recommendation
>> 
>> Hello;
>> I am trying to get predictions after running the ALS model.
>> The model works fine. In the prediction/recommendation , I have about 30
>> ,000 products and 90 Millions users.
>> When i try the predict all it fails.
>> I have been trying to formulate the problem as a Matrix multiplication where
>> I first get the product features, broadcast them and then do a dot product.
>> Its still very slow. Any reason why
>> here is a sample code
>> 
>> def doMultiply(x):
>> a = []
>> #multiply by
>> mylen = len(pf.value)
>> for i in range(mylen) :
>>   myprod = numpy.dot(x,pf.value[i][1])
>>   a.append(myprod)
>> return a
>> 
>> 
>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>> #I need to select which products to broadcast but lets try all
>> m1 = myModel.productFeatures().sample(False, 0.001)
>> pf = sc.broadcast(m1.collect())
>> uf = myModel.userFeatures()
>> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>> 
>> 
>> 
>> 

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Debasish Das
Also in my experiments, it's much faster to blocked BLAS through cartesian
rather than doing sc.union. Here are the details on the experiments:

https://issues.apache.org/jira/browse/SPARK-4823

On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das 
wrote:

> Also not sure how threading helps here because Spark puts a partition to
> each core. On each core may be there are multiple threads if you are using
> intel hyperthreading but I will let Spark handle the threading.
>
> On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das 
> wrote:
>
>> We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
>> dgemm based calculation.
>>
>> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
>> ayman.fara...@yahoo.com.invalid> wrote:
>>
>>> Thanks Sabarish and Nick
>>> Would you happen to have some code snippets that you can share.
>>> Best
>>> Ayman
>>>
>>> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
>>> sabarish.sasidha...@manthan.com> wrote:
>>>
>>> Nick is right. I too have implemented this way and it works just fine.
>>> In my case, there can be even more products. You simply broadcast blocks of
>>> products to userFeatures.mapPartitions() and BLAS multiply in there to get
>>> recommendations. In my case 10K products form one block. Note that you
>>> would then have to union your recommendations. And if there lots of product
>>> blocks, you might also want to checkpoint once every few times.
>>>
>>> Regards
>>> Sab
>>>
>>> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <
>>> nick.pentre...@gmail.com> wrote:
>>>
 One issue is that you broadcast the product vectors and then do a dot
 product one-by-one with the user vector.

 You should try forming a matrix of the item vectors and doing the dot
 product as a matrix-vector multiply which will make things a lot faster.

 Another optimisation that is avalailable on 1.4 is a recommendProducts
 method that blockifies the factors to make use of level 3 BLAS (ie
 matrix-matrix multiply). I am not sure if this is available in The Python
 api yet.

 But you can do a version yourself by using mapPartitions over user
 factors, blocking the factors into sub-matrices and doing matrix multiply
 with item factor matrix to get scores on a block-by-block basis.

 Also as Ilya says more parallelism can help. I don't think it's so
 necessary to do LSH with 30,000 items.

 —
 Sent from Mailbox 


 On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
 ilya.gane...@capitalone.com> wrote:

> Actually talk about this exact thing in a blog post here
> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
> Keep in mind, you're actually doing a ton of math. Even with proper 
> caching
> and use of broadcast variables this will take a while defending on the 
> size
> of your cluster. To get real results you may want to look into locality
> sensitive hashing to limit your search space and definitely look into
> spinning up multiple threads to process your product features in parallel
> to increase resource utilization on the cluster.
>
>
>
> Thank you,
> Ilya Ganelin
>
>
>
> -Original Message-
> *From: *afarahat [ayman.fara...@yahoo.com]
> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
> *To: *user@spark.apache.org
> *Subject: *Matrix Multiplication and mllib.recommendation
>
> Hello;
> I am trying to get predictions after running the ALS model.
> The model works fine. In the prediction/recommendation , I have about
> 30
> ,000 products and 90 Millions users.
> When i try the predict all it fails.
> I have been trying to formulate the problem as a Matrix multiplication
> where
> I first get the product features, broadcast them and then do a dot
> product.
> Its still very slow. Any reason why
> here is a sample code
>
> def doMultiply(x):
> a = []
> #multiply by
> mylen = len(pf.value)
> for i in range(mylen) :
>   myprod = numpy.dot(x,pf.value[i][1])
>   a.append(myprod)
> return a
>
>
> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
> #I need to select which products to broadcast but lets try all
> m1 = myModel.productFeatures().sample(False, 0.001)
> pf = sc.broadcast(m1.collect())
> uf = myModel.userFeatures()
> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
>
> 

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Debasish Das
Also not sure how threading helps here because Spark puts a partition to
each core. On each core may be there are multiple threads if you are using
intel hyperthreading but I will let Spark handle the threading.

On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das 
wrote:

> We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
> dgemm based calculation.
>
> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
> ayman.fara...@yahoo.com.invalid> wrote:
>
>> Thanks Sabarish and Nick
>> Would you happen to have some code snippets that you can share.
>> Best
>> Ayman
>>
>> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com> wrote:
>>
>> Nick is right. I too have implemented this way and it works just fine. In
>> my case, there can be even more products. You simply broadcast blocks of
>> products to userFeatures.mapPartitions() and BLAS multiply in there to get
>> recommendations. In my case 10K products form one block. Note that you
>> would then have to union your recommendations. And if there lots of product
>> blocks, you might also want to checkpoint once every few times.
>>
>> Regards
>> Sab
>>
>> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <
>> nick.pentre...@gmail.com> wrote:
>>
>>> One issue is that you broadcast the product vectors and then do a dot
>>> product one-by-one with the user vector.
>>>
>>> You should try forming a matrix of the item vectors and doing the dot
>>> product as a matrix-vector multiply which will make things a lot faster.
>>>
>>> Another optimisation that is avalailable on 1.4 is a recommendProducts
>>> method that blockifies the factors to make use of level 3 BLAS (ie
>>> matrix-matrix multiply). I am not sure if this is available in The Python
>>> api yet.
>>>
>>> But you can do a version yourself by using mapPartitions over user
>>> factors, blocking the factors into sub-matrices and doing matrix multiply
>>> with item factor matrix to get scores on a block-by-block basis.
>>>
>>> Also as Ilya says more parallelism can help. I don't think it's so
>>> necessary to do LSH with 30,000 items.
>>>
>>> —
>>> Sent from Mailbox 
>>>
>>>
>>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>>> ilya.gane...@capitalone.com> wrote:
>>>
 Actually talk about this exact thing in a blog post here
 http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
 Keep in mind, you're actually doing a ton of math. Even with proper caching
 and use of broadcast variables this will take a while defending on the size
 of your cluster. To get real results you may want to look into locality
 sensitive hashing to limit your search space and definitely look into
 spinning up multiple threads to process your product features in parallel
 to increase resource utilization on the cluster.



 Thank you,
 Ilya Ganelin



 -Original Message-
 *From: *afarahat [ayman.fara...@yahoo.com]
 *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
 *To: *user@spark.apache.org
 *Subject: *Matrix Multiplication and mllib.recommendation

 Hello;
 I am trying to get predictions after running the ALS model.
 The model works fine. In the prediction/recommendation , I have about 30
 ,000 products and 90 Millions users.
 When i try the predict all it fails.
 I have been trying to formulate the problem as a Matrix multiplication
 where
 I first get the product features, broadcast them and then do a dot
 product.
 Its still very slow. Any reason why
 here is a sample code

 def doMultiply(x):
 a = []
 #multiply by
 mylen = len(pf.value)
 for i in range(mylen) :
   myprod = numpy.dot(x,pf.value[i][1])
   a.append(myprod)
 return a


 myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
 #I need to select which products to broadcast but lets try all
 m1 = myModel.productFeatures().sample(False, 0.001)
 pf = sc.broadcast(m1.collect())
 uf = myModel.userFeatures()
 f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com
 .

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


 --
 The information contained in this e-mail is confidential and/or
 proprietary to Capital One and/or its affiliates and may only be used
 solely in performance of work or services for Capita

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Debasish Das
We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
dgemm based calculation.

On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
ayman.fara...@yahoo.com.invalid> wrote:

> Thanks Sabarish and Nick
> Would you happen to have some code snippets that you can share.
> Best
> Ayman
>
> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
> Nick is right. I too have implemented this way and it works just fine. In
> my case, there can be even more products. You simply broadcast blocks of
> products to userFeatures.mapPartitions() and BLAS multiply in there to get
> recommendations. In my case 10K products form one block. Note that you
> would then have to union your recommendations. And if there lots of product
> blocks, you might also want to checkpoint once every few times.
>
> Regards
> Sab
>
> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath  > wrote:
>
>> One issue is that you broadcast the product vectors and then do a dot
>> product one-by-one with the user vector.
>>
>> You should try forming a matrix of the item vectors and doing the dot
>> product as a matrix-vector multiply which will make things a lot faster.
>>
>> Another optimisation that is avalailable on 1.4 is a recommendProducts
>> method that blockifies the factors to make use of level 3 BLAS (ie
>> matrix-matrix multiply). I am not sure if this is available in The Python
>> api yet.
>>
>> But you can do a version yourself by using mapPartitions over user
>> factors, blocking the factors into sub-matrices and doing matrix multiply
>> with item factor matrix to get scores on a block-by-block basis.
>>
>> Also as Ilya says more parallelism can help. I don't think it's so
>> necessary to do LSH with 30,000 items.
>>
>> —
>> Sent from Mailbox 
>>
>>
>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>> ilya.gane...@capitalone.com> wrote:
>>
>>> Actually talk about this exact thing in a blog post here
>>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>> Keep in mind, you're actually doing a ton of math. Even with proper caching
>>> and use of broadcast variables this will take a while defending on the size
>>> of your cluster. To get real results you may want to look into locality
>>> sensitive hashing to limit your search space and definitely look into
>>> spinning up multiple threads to process your product features in parallel
>>> to increase resource utilization on the cluster.
>>>
>>>
>>>
>>> Thank you,
>>> Ilya Ganelin
>>>
>>>
>>>
>>> -Original Message-
>>> *From: *afarahat [ayman.fara...@yahoo.com]
>>> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>>> *To: *user@spark.apache.org
>>> *Subject: *Matrix Multiplication and mllib.recommendation
>>>
>>> Hello;
>>> I am trying to get predictions after running the ALS model.
>>> The model works fine. In the prediction/recommendation , I have about 30
>>> ,000 products and 90 Millions users.
>>> When i try the predict all it fails.
>>> I have been trying to formulate the problem as a Matrix multiplication
>>> where
>>> I first get the product features, broadcast them and then do a dot
>>> product.
>>> Its still very slow. Any reason why
>>> here is a sample code
>>>
>>> def doMultiply(x):
>>> a = []
>>> #multiply by
>>> mylen = len(pf.value)
>>> for i in range(mylen) :
>>>   myprod = numpy.dot(x,pf.value[i][1])
>>>   a.append(myprod)
>>> return a
>>>
>>>
>>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>>> #I need to select which products to broadcast but lets try all
>>> m1 = myModel.productFeatures().sample(False, 0.001)
>>> pf = sc.broadcast(m1.collect())
>>> uf = myModel.userFeatures()
>>> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>> --
>>> The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The information
>>> transmitted herewith is intended only for use by the individual or entity
>>> to which it is addressed. If the reader of this message is not the intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of any
>>> action in reliance upon this information is strictly prohibited. If you
>>> have received th

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Ayman Farahat
Thanks Sabarish and Nick
Would you happen to have some code snippets that you can share. 
Best
Ayman
On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan 
 wrote:

> Nick is right. I too have implemented this way and it works just fine. In my 
> case, there can be even more products. You simply broadcast blocks of 
> products to userFeatures.mapPartitions() and BLAS multiply in there to get 
> recommendations. In my case 10K products form one block. Note that you would 
> then have to union your recommendations. And if there lots of product blocks, 
> you might also want to checkpoint once every few times.
> 
> Regards
> Sab
> 
> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath  
> wrote:
> One issue is that you broadcast the product vectors and then do a dot product 
> one-by-one with the user vector.
> 
> You should try forming a matrix of the item vectors and doing the dot product 
> as a matrix-vector multiply which will make things a lot faster.
> 
> Another optimisation that is avalailable on 1.4 is a recommendProducts method 
> that blockifies the factors to make use of level 3 BLAS (ie matrix-matrix 
> multiply). I am not sure if this is available in The Python api yet. 
> 
> But you can do a version yourself by using mapPartitions over user factors, 
> blocking the factors into sub-matrices and doing matrix multiply with item 
> factor matrix to get scores on a block-by-block basis.
> 
> Also as Ilya says more parallelism can help. I don't think it's so necessary 
> to do LSH with 30,000 items.
> 
> —
> Sent from Mailbox
> 
> 
> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya  
> wrote:
> 
> Actually talk about this exact thing in a blog post here 
> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>  Keep in mind, you're actually doing a ton of math. Even with proper caching 
> and use of broadcast variables this will take a while defending on the size 
> of your cluster. To get real results you may want to look into locality 
> sensitive hashing to limit your search space and definitely look into 
> spinning up multiple threads to process your product features in parallel to 
> increase resource utilization on the cluster.
> 
> 
> 
> Thank you,
> Ilya Ganelin
> 
> 
> 
> -Original Message-
> From: afarahat [ayman.fara...@yahoo.com]
> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
> To: user@spark.apache.org
> Subject: Matrix Multiplication and mllib.recommendation
> 
> Hello;
> I am trying to get predictions after running the ALS model.
> The model works fine. In the prediction/recommendation , I have about 30
> ,000 products and 90 Millions users.
> When i try the predict all it fails.
> I have been trying to formulate the problem as a Matrix multiplication where
> I first get the product features, broadcast them and then do a dot product.
> Its still very slow. Any reason why
> here is a sample code
> 
> def doMultiply(x):
> a = []
> #multiply by
> mylen = len(pf.value)
> for i in range(mylen) :
>   myprod = numpy.dot(x,pf.value[i][1])
>   a.append(myprod)
> return a
> 
> 
> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
> #I need to select which products to broadcast but lets try all
> m1 = myModel.productFeatures().sample(False, 0.001)
> pf = sc.broadcast(m1.collect())
> uf = myModel.userFeatures()
> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.
> 
> 
> 
> 
> -- 
> 
> Architect - Big Data
> Ph: +91 99805 99458
> 
> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
> India ICT)
> +++



Re: Matrix Multiplication and mllib.recommendation

2015-06-17 Thread Sabarish Sasidharan
Nick is right. I too have implemented this way and it works just fine. In
my case, there can be even more products. You simply broadcast blocks of
products to userFeatures.mapPartitions() and BLAS multiply in there to get
recommendations. In my case 10K products form one block. Note that you
would then have to union your recommendations. And if there lots of product
blocks, you might also want to checkpoint once every few times.

Regards
Sab

On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath 
wrote:

> One issue is that you broadcast the product vectors and then do a dot
> product one-by-one with the user vector.
>
> You should try forming a matrix of the item vectors and doing the dot
> product as a matrix-vector multiply which will make things a lot faster.
>
> Another optimisation that is avalailable on 1.4 is a recommendProducts
> method that blockifies the factors to make use of level 3 BLAS (ie
> matrix-matrix multiply). I am not sure if this is available in The Python
> api yet.
>
> But you can do a version yourself by using mapPartitions over user
> factors, blocking the factors into sub-matrices and doing matrix multiply
> with item factor matrix to get scores on a block-by-block basis.
>
> Also as Ilya says more parallelism can help. I don't think it's so
> necessary to do LSH with 30,000 items.
>
> —
> Sent from Mailbox 
>
>
> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
> ilya.gane...@capitalone.com> wrote:
>
>> Actually talk about this exact thing in a blog post here
>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>> Keep in mind, you're actually doing a ton of math. Even with proper caching
>> and use of broadcast variables this will take a while defending on the size
>> of your cluster. To get real results you may want to look into locality
>> sensitive hashing to limit your search space and definitely look into
>> spinning up multiple threads to process your product features in parallel
>> to increase resource utilization on the cluster.
>>
>>
>>
>> Thank you,
>> Ilya Ganelin
>>
>>
>>
>> -Original Message-
>> *From: *afarahat [ayman.fara...@yahoo.com]
>> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>> *To: *user@spark.apache.org
>> *Subject: *Matrix Multiplication and mllib.recommendation
>>
>> Hello;
>> I am trying to get predictions after running the ALS model.
>> The model works fine. In the prediction/recommendation , I have about 30
>> ,000 products and 90 Millions users.
>> When i try the predict all it fails.
>> I have been trying to formulate the problem as a Matrix multiplication
>> where
>> I first get the product features, broadcast them and then do a dot
>> product.
>> Its still very slow. Any reason why
>> here is a sample code
>>
>> def doMultiply(x):
>> a = []
>> #multiply by
>> mylen = len(pf.value)
>> for i in range(mylen) :
>>   myprod = numpy.dot(x,pf.value[i][1])
>>   a.append(myprod)
>> return a
>>
>>
>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>> #I need to select which products to broadcast but lets try all
>> m1 = myModel.productFeatures().sample(False, 0.001)
>> pf = sc.broadcast(m1.collect())
>> uf = myModel.userFeatures()
>> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>> --
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>
>


-- 

Architect - Big Data
Ph: +91 99805 99458

Manthan Systems | *Company of the year - Analytics (2014 Frost and Sullivan
India ICT)*
+++


RE: Matrix Multiplication and mllib.recommendation

2015-06-17 Thread Nick Pentreath
One issue is that you broadcast the product vectors and then do a dot product 
one-by-one with the user vector.




You should try forming a matrix of the item vectors and doing the dot product 
as a matrix-vector multiply which will make things a lot faster.




Another optimisation that is avalailable on 1.4 is a recommendProducts method 
that blockifies the factors to make use of level 3 BLAS (ie matrix-matrix 
multiply). I am not sure if this is available in The Python api yet. 




But you can do a version yourself by using mapPartitions over user factors, 
blocking the factors into sub-matrices and doing matrix multiply with item 
factor matrix to get scores on a block-by-block basis.




Also as Ilya says more parallelism can help. I don't think it's so necessary to 
do LSH with 30,000 items.



—
Sent from Mailbox

On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya
 wrote:

> Actually talk about this exact thing in a blog post here 
> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>  Keep in mind, you're actually doing a ton of math. Even with proper caching 
> and use of broadcast variables this will take a while defending on the size 
> of your cluster. To get real results you may want to look into locality 
> sensitive hashing to limit your search space and definitely look into 
> spinning up multiple threads to process your product features in parallel to 
> increase resource utilization on the cluster.
> Thank you,
> Ilya Ganelin
> -Original Message-
> From: afarahat [ayman.fara...@yahoo.com]
> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
> To: user@spark.apache.org
> Subject: Matrix Multiplication and mllib.recommendation
> Hello;
> I am trying to get predictions after running the ALS model.
> The model works fine. In the prediction/recommendation , I have about 30
> ,000 products and 90 Millions users.
> When i try the predict all it fails.
> I have been trying to formulate the problem as a Matrix multiplication where
> I first get the product features, broadcast them and then do a dot product.
> Its still very slow. Any reason why
> here is a sample code
> def doMultiply(x):
> a = []
> #multiply by
> mylen = len(pf.value)
> for i in range(mylen) :
>   myprod = numpy.dot(x,pf.value[i][1])
>   a.append(myprod)
> return a
> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
> #I need to select which products to broadcast but lets try all
> m1 = myModel.productFeatures().sample(False, 0.001)
> pf = sc.broadcast(m1.collect())
> uf = myModel.userFeatures()
> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.

RE: Matrix Multiplication and mllib.recommendation

2015-06-17 Thread Ganelin, Ilya
Actually talk about this exact thing in a blog post here 
http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
 Keep in mind, you're actually doing a ton of math. Even with proper caching 
and use of broadcast variables this will take a while defending on the size of 
your cluster. To get real results you may want to look into locality sensitive 
hashing to limit your search space and definitely look into spinning up 
multiple threads to process your product features in parallel to increase 
resource utilization on the cluster.



Thank you,
Ilya Ganelin



-Original Message-
From: afarahat [ayman.fara...@yahoo.com]
Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
To: user@spark.apache.org
Subject: Matrix Multiplication and mllib.recommendation


Hello;
I am trying to get predictions after running the ALS model.
The model works fine. In the prediction/recommendation , I have about 30
,000 products and 90 Millions users.
When i try the predict all it fails.
I have been trying to formulate the problem as a Matrix multiplication where
I first get the product features, broadcast them and then do a dot product.
Its still very slow. Any reason why
here is a sample code

def doMultiply(x):
a = []
#multiply by
mylen = len(pf.value)
for i in range(mylen) :
  myprod = numpy.dot(x,pf.value[i][1])
  a.append(myprod)
return a


myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
#I need to select which products to broadcast but lets try all
m1 = myModel.productFeatures().sample(False, 0.001)
pf = sc.broadcast(m1.collect())
uf = myModel.userFeatures()
f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.