Dear all,
after some fiddling I have arrived at this solution:
/**
* Customized left outer join on common column.
*/
def leftOuterJoinWithRemovalOfEqualColumn(leftDF: DataFrame, rightDF:
DataFrame, commonColumnName: String): DataFrame = {
val joinedDF =
Hi, I am trying to configure a history server for application.
When I running locally(./run-example SparkPi), the event logs are being
created, and I can start history server.
But when I am trying
./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster
Hi, all
I upgrage spark to 1.4.1, many applications failed... I find the heap memory is
not full , but the process of CoarseGrainedExecutorBackend will take more
memory than I expect, and it will increase as time goes on, finally more than
max limited of the server, the worker will die.
Hello,
I am not an expert with Spark, but the error thrown by spark seems indicate
that not enough memory for launching job. By default, Spark allocated 1GB
for memory, may be you should increase it ?
Best regards
Fabrice
Le sam. 1 août 2015 à 22:51, Connor Zanin cnnr...@udel.edu a écrit :
https://spark-summit.org/2015/events/making-sense-of-spark-performance/
On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus edel...@gmail.com wrote:
Hi All!
How important would be a significant performance improvement to TCP/IP
itself, in terms of
overall job performance improvement. Which part
H
2% huh.
-- ttfn
Simon Edelhaus
California 2015
On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra m...@clearstorydata.com
wrote:
https://spark-summit.org/2015/events/making-sense-of-spark-performance/
On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus edel...@gmail.com wrote:
Hi All!
On Sat, Aug 1, 2015 at 9:25 AM, Akmal Abbasov akmal.abba...@icloud.com
wrote:
When I running locally(./run-example SparkPi), the event logs are being
created, and I can start history server.
But when I am trying
./spark-submit --class org.apache.spark.examples.SparkPi --master
yarn-cluster
Sent from my iPad
On 2014-9-24, at 上午8:13, Steve Lewis lordjoe2...@gmail.com wrote:
When I experimented with using an InputFormat I had used in Hadoop for a
long time in Hadoop I found
1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated
class not
1. I believe that the default memory (per executor) is 512m (from the
documentation)
2. I have increased the memory used by spark on workers in my launch script
when submitting the job
(--executor-memory 124g)
3. The job completes successfully, it is the road bumps in the middle I
am
You should also take into account amount of memory that you plan to use.
It's advised not to give too much memory for each executor .. otherwise GC
overhead will go up.
Btw, why prime numbers?
--
Ruslan Dautkhanov
On Wed, Jul 29, 2015 at 3:31 AM, ponkin alexey.pon...@ya.ru wrote:
Hi Rahul,
Hi All!
How important would be a significant performance improvement to TCP/IP
itself, in terms of
overall job performance improvement. Which part would be most significantly
accelerated?
Would it be HDFS?
-- ttfn
Simon Edelhaus
California 2015
If your network is bandwidth-bound, you'll see setting jumbo frames (MTU
9000)
may increase bandwidth up to ~20%.
http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
Enabling Jumbo Frames across the cluster improves bandwidth
If Spark workload is not network
Hi Ocatavian,
Just out of curiosity, did you try persisting your RDD in serialized format
MEMORY_AND_DISK_SER or MEMORY_ONLY_SER ??
i.e. changing your :
rdd.persist(MEMORY_AND_DISK)
to
rdd.persist(MEMORY_ONLY_SER)
Regards
On Wed, Jun 10, 2015 at 7:27 AM, Imran Rashid iras...@cloudera.com
13 matches
Mail list logo