Hi Sab,

Thanks for your response. We’re thinking of trying a bigger cluster, because we 
just started with 2 nodes. What we really want to know is whether the code will 
scale up with larger matrices and more nodes. I’d be interested to hear how 
large a matrix multiplication you managed to do?

Is there an alternative you’d recommend for calculating similarity over a large 
dataset?

Thanks,
Eilidh

On 13 Nov 2015, at 09:55, Sabarish Sasidharan <sabarish.sasidha...@manthan.com> 
wrote:

> We have done this by blocking but without using BlockMatrix. We used our own 
> blocking mechanism because BlockMatrix didn't exist in Spark 1.2. What is the 
> size of your block? How much memory are you giving to the executors? I assume 
> you are running on YARN, if so you would want to make sure your yarn executor 
> memory overhead is set to a higher value than default.
> 
> Just curious, could you also explain why you need matrix multiplication with 
> transpose? Smells like similarity computation.
> 
> Regards
> Sab
> 
> On Thu, Nov 12, 2015 at 7:27 PM, Eilidh Troup <e.tr...@epcc.ed.ac.uk> wrote:
> Hi,
> 
> I’m trying to multiply a large squarish matrix with its transpose. Eventually 
> I’d like to work with matrices of size 200,000 by 500,000, but I’ve started 
> off first with 100 by 100 which was fine, and then with 10,000 by 10,000 
> which failed with an out of memory exception.
> 
> I used MLlib and BlockMatrix and tried various block sizes, and also tried 
> switching disk serialisation on.
> 
> We are running on a small cluster, using a CSV file in HDFS as the input data.
> 
> Would anyone with experience of multiplying large, dense matrices in spark be 
> able to comment on what to try to make this work?
> 
> Thanks,
> Eilidh
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> 
> 
> -- 
> 
> Architect - Big Data
> Ph: +91 99805 99458
> 
> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
> India ICT)
> +++

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to