Re: large, dense matrix multiplication

2015-11-17 Thread Eilidh Troup
Hi Burak,

That’s interesting. I’ll try and give it a go.

Eilidh

On 14 Nov 2015, at 04:19, Burak Yavuz <brk...@gmail.com> wrote:

> Hi,
> 
> The BlockMatrix multiplication should be much more efficient on the current 
> master (and will be available with Spark 1.6). Could you please give that a 
> try if you have the chance?
> 
> Thanks,
> Burak
> 
> On Fri, Nov 13, 2015 at 10:11 AM, Sabarish Sasidharan 
> <sabarish.sasidha...@manthan.com> wrote:
> Hi Eilidh
> 
> Because you are multiplying with the transpose you don't have  to necessarily 
> build the right side of the matrix. I hope you see that. You can broadcast 
> blocks of the indexed row matrix to itself and achieve the multiplication.
> 
> But for similarity computation you might want to use some approach like 
> locality sensitive hashing first to identify a bunch of similar customers and 
> then apply cosine similarity on that narrowed down list. That would scale 
> much better than matrix multiplication. You could try the following options 
> for the same.
> 
> https://github.com/soundcloud/cosine-lsh-join-spark
> http://spark-packages.org/package/tdebatty/spark-knn-graphs
> https://github.com/marufaytekin/lsh-spark
> 
> Regards
> Sab
> 
> Hi Sab,
> 
> Thanks for your response. We’re thinking of trying a bigger cluster, because 
> we just started with 2 nodes. What we really want to know is whether the code 
> will scale up with larger matrices and more nodes. I’d be interested to hear 
> how large a matrix multiplication you managed to do?
> 
> Is there an alternative you’d recommend for calculating similarity over a 
> large dataset?
> 
> Thanks,
> Eilidh
> 
> On 13 Nov 2015, at 09:55, Sabarish Sasidharan 
> <sabarish.sasidha...@manthan.com> wrote:
> 
>> We have done this by blocking but without using BlockMatrix. We used our own 
>> blocking mechanism because BlockMatrix didn't exist in Spark 1.2. What is 
>> the size of your block? How much memory are you giving to the executors? I 
>> assume you are running on YARN, if so you would want to make sure your yarn 
>> executor memory overhead is set to a higher value than default.
>> 
>> Just curious, could you also explain why you need matrix multiplication with 
>> transpose? Smells like similarity computation.
>> 
>> Regards
>> Sab
>> 
>> On Thu, Nov 12, 2015 at 7:27 PM, Eilidh Troup <e.tr...@epcc.ed.ac.uk> wrote:
>> Hi,
>> 
>> I’m trying to multiply a large squarish matrix with its transpose. 
>> Eventually I’d like to work with matrices of size 200,000 by 500,000, but 
>> I’ve started off first with 100 by 100 which was fine, and then with 10,000 
>> by 10,000 which failed with an out of memory exception.
>> 
>> I used MLlib and BlockMatrix and tried various block sizes, and also tried 
>> switching disk serialisation on.
>> 
>> We are running on a small cluster, using a CSV file in HDFS as the input 
>> data.
>> 
>> Would anyone with experience of multiplying large, dense matrices in spark 
>> be able to comment on what to try to make this work?
>> 
>> Thanks,
>> Eilidh
>> 
>> 
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
>> 
>> 
>> 
>> -- 
>> 
>> Architect - Big Data
>> Ph: +91 99805 99458
>> 
>> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
>> India ICT)
>> +++
> 
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: large, dense matrix multiplication

2015-11-13 Thread Eilidh Troup
Hi Sab,

Thanks for your response. We’re thinking of trying a bigger cluster, because we 
just started with 2 nodes. What we really want to know is whether the code will 
scale up with larger matrices and more nodes. I’d be interested to hear how 
large a matrix multiplication you managed to do?

Is there an alternative you’d recommend for calculating similarity over a large 
dataset?

Thanks,
Eilidh

On 13 Nov 2015, at 09:55, Sabarish Sasidharan <sabarish.sasidha...@manthan.com> 
wrote:

> We have done this by blocking but without using BlockMatrix. We used our own 
> blocking mechanism because BlockMatrix didn't exist in Spark 1.2. What is the 
> size of your block? How much memory are you giving to the executors? I assume 
> you are running on YARN, if so you would want to make sure your yarn executor 
> memory overhead is set to a higher value than default.
> 
> Just curious, could you also explain why you need matrix multiplication with 
> transpose? Smells like similarity computation.
> 
> Regards
> Sab
> 
> On Thu, Nov 12, 2015 at 7:27 PM, Eilidh Troup <e.tr...@epcc.ed.ac.uk> wrote:
> Hi,
> 
> I’m trying to multiply a large squarish matrix with its transpose. Eventually 
> I’d like to work with matrices of size 200,000 by 500,000, but I’ve started 
> off first with 100 by 100 which was fine, and then with 10,000 by 10,000 
> which failed with an out of memory exception.
> 
> I used MLlib and BlockMatrix and tried various block sizes, and also tried 
> switching disk serialisation on.
> 
> We are running on a small cluster, using a CSV file in HDFS as the input data.
> 
> Would anyone with experience of multiplying large, dense matrices in spark be 
> able to comment on what to try to make this work?
> 
> Thanks,
> Eilidh
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> 
> 
> -- 
> 
> Architect - Big Data
> Ph: +91 99805 99458
> 
> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
> India ICT)
> +++

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

large, dense matrix multiplication

2015-11-12 Thread Eilidh Troup
Hi,

I’m trying to multiply a large squarish matrix with its transpose. Eventually 
I’d like to work with matrices of size 200,000 by 500,000, but I’ve started off 
first with 100 by 100 which was fine, and then with 10,000 by 10,000 which 
failed with an out of memory exception.

I used MLlib and BlockMatrix and tried various block sizes, and also tried 
switching disk serialisation on.

We are running on a small cluster, using a CSV file in HDFS as the input data. 

Would anyone with experience of multiplying large, dense matrices in spark be 
able to comment on what to try to make this work?

Thanks,
Eilidh


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org