this is a hardware size issue and we should test it
on larger machines?
Regards,
Manish
From: Manish Gupta 8 [mailto:mgupt...@sapient.com]
Sent: Wednesday, March 18, 2015 11:20 PM
To: Reza Zadeh
Cc: user@spark.apache.org
Subject: RE: Column Similarity using DIMSUM
Hi Reza,
I have tried
Thanks Reza. It makes perfect sense.
Regards,
Manish
From: Reza Zadeh [mailto:r...@databricks.com]
Sent: Thursday, March 19, 2015 11:58 PM
To: Manish Gupta 8
Cc: user@spark.apache.org
Subject: Re: Column Similarity using DIMSUM
Hi Manish,
With 56431 columns, the output can be as large as 56431
test it on larger machines?
Regards,
Manish
*From:* Manish Gupta 8 [mailto:mgupt...@sapient.com]
*Sent:* Wednesday, March 18, 2015 11:20 PM
*To:* Reza Zadeh
*Cc:* user@spark.apache.org
*Subject:* RE: Column Similarity using DIMSUM
Hi Reza,
I have tried threshold to be only
Subject: Re: Column Similarity using DIMSUM
Hi Manish,
Did you try calling columnSimilarities(threshold) with different threshold
values? You try threshold values of 0.1, 0.5, 1, and 20, and higher.
Best,
Reza
On Wed, Mar 18, 2015 at 10:40 AM, Manish Gupta 8
mgupt...@sapient.commailto:mgupt
Hi Manish,
Did you try calling columnSimilarities(threshold) with different threshold
values? You try threshold values of 0.1, 0.5, 1, and 20, and higher.
Best,
Reza
On Wed, Mar 18, 2015 at 10:40 AM, Manish Gupta 8 mgupt...@sapient.com
wrote:
Hi,
I am running Column Similarity (All Pairs