Hi,
Is there a document/link that describes the general configuration settings to
achieve maximum Spark Performance while running on CDH5? In our environment, we
did lot of changes (and still doing it) to get decent performance otherwise our
6 node dev cluster with default configurations, lags
.
Thanks,
Manish
From: Evo Eftimov [mailto:evo.efti...@isecc.com]
Sent: Thursday, April 16, 2015 10:38 PM
To: Manish Gupta 8; user@spark.apache.org
Subject: RE: General configurations on CDH5 to achieve maximum Spark Performance
Well there are a number of performance tuning guidelines in dedicated
Thanks for the information Andy. I will go through the versions mentioned in
Dependencies.scala to identify the compatibility.
Regards,
Manish
From: andy petrella [mailto:andy.petre...@gmail.com]
Sent: Tuesday, April 07, 2015 11:04 AM
To: Manish Gupta 8; user@spark.apache.org
Subject: Re
If I try to build spark-notebook with spark.version=1.2.0-cdh5.3.0, sbt
throw these warnings before failing to compile:
:: org.apache.spark#spark-yarn_2.10;1.2.0-cdh5.3.0: not found
:: org.apache.spark#spark-repl_2.10;1.2.0-cdh5.3.0: not found
Any suggestions?
Thanks
From: Manish Gupta 8
Has anyone else faced this issue of running spark-shell (yarn client mode) in
an environment with strict firewall rules (on fixed allowed incoming ports)?
How can this be rectified?
Thanks,
Manish
From: Manish Gupta 8
Sent: Thursday, March 26, 2015 4:09 PM
To: user@spark.apache.org
Subject
Hi,
I am running spark-shell and connecting with a yarn cluster with deploy mode as
client. In our environment, there are some security policies that doesn't
allow us to open all TCP port.
Issue I am facing is: Spark Shell driver is using a random port for
BlockManagerID -
this is a hardware size issue and we should test it
on larger machines?
Regards,
Manish
From: Manish Gupta 8 [mailto:mgupt...@sapient.com]
Sent: Wednesday, March 18, 2015 11:20 PM
To: Reza Zadeh
Cc: user@spark.apache.org
Subject: RE: Column Similarity using DIMSUM
Hi Reza,
I have tried
Thanks Reza. It makes perfect sense.
Regards,
Manish
From: Reza Zadeh [mailto:r...@databricks.com]
Sent: Thursday, March 19, 2015 11:58 PM
To: Manish Gupta 8
Cc: user@spark.apache.org
Subject: Re: Column Similarity using DIMSUM
Hi Manish,
With 56431 columns, the output can be as large as 56431
Hi,
I am running Column Similarity (All Pairs Similarity using DIMSUM) in Spark on
a dataset that looks like (Entity, Attribute, Value) after transforming the
same to a row-oriented dense matrix format (one line per Attribute, one column
per Entity, each cell with normalized value – between 0
Hi Reza,
I have tried threshold to be only in the range of 0 to 1. I was not aware that
threshold can be set to above 1.
Will try and update.
Thank You
- Manish
From: Reza Zadeh [mailto:r...@databricks.com]
Sent: Wednesday, March 18, 2015 10:55 PM
To: Manish Gupta 8
Cc: user@spark.apache.org
10 matches
Mail list logo