RE: Column Similarity using DIMSUM

2015-03-19 Thread Manish Gupta 8
this is a hardware size issue and we should test it on larger machines? Regards, Manish From: Manish Gupta 8 [mailto:mgupt...@sapient.com] Sent: Wednesday, March 18, 2015 11:20 PM To: Reza Zadeh Cc: user@spark.apache.org Subject: RE: Column Similarity using DIMSUM Hi Reza, I have tried

RE: Column Similarity using DIMSUM

2015-03-19 Thread Manish Gupta 8
Thanks Reza. It makes perfect sense. Regards, Manish From: Reza Zadeh [mailto:r...@databricks.com] Sent: Thursday, March 19, 2015 11:58 PM To: Manish Gupta 8 Cc: user@spark.apache.org Subject: Re: Column Similarity using DIMSUM Hi Manish, With 56431 columns, the output can be as large as 56431

Re: Column Similarity using DIMSUM

2015-03-19 Thread Reza Zadeh
test it on larger machines? Regards, Manish *From:* Manish Gupta 8 [mailto:mgupt...@sapient.com] *Sent:* Wednesday, March 18, 2015 11:20 PM *To:* Reza Zadeh *Cc:* user@spark.apache.org *Subject:* RE: Column Similarity using DIMSUM Hi Reza, I have tried threshold to be only

RE: Column Similarity using DIMSUM

2015-03-18 Thread Manish Gupta 8
Subject: Re: Column Similarity using DIMSUM Hi Manish, Did you try calling columnSimilarities(threshold) with different threshold values? You try threshold values of 0.1, 0.5, 1, and 20, and higher. Best, Reza On Wed, Mar 18, 2015 at 10:40 AM, Manish Gupta 8 mgupt...@sapient.commailto:mgupt

Re: Column Similarity using DIMSUM

2015-03-18 Thread Reza Zadeh
Hi Manish, Did you try calling columnSimilarities(threshold) with different threshold values? You try threshold values of 0.1, 0.5, 1, and 20, and higher. Best, Reza On Wed, Mar 18, 2015 at 10:40 AM, Manish Gupta 8 mgupt...@sapient.com wrote: Hi, I am running Column Similarity (All Pairs