Hi all,
I am experimenting and learning performance on big tasks locally, with a 32
cores node and more than 64GB of Ram, data is loaded from a database through
JDBC driver, and launching heavy computations against it. I am presented with
two questions:
1. My RDD is poorly distributed. I
Hi Saif,
Are you using JdbcRDD directly from Spark?
If yes, then the poor distribution could be due to the bound key you used.
See the JdbcRDD Scala doc at
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.JdbcRDD
:
sql
the text of the query. The query must contain
.
Saif
From: shenyan zhen [mailto:shenya...@gmail.com]
Sent: Tuesday, July 28, 2015 4:16 PM
To: Ellafi, Saif A.
Cc: user@spark.apache.org
Subject: Re: Fighting against performance: JDBC RDD badly distributed
Hi Saif,
Are you using JdbcRDD directly from Spark?
If yes, then the poor distribution could
:* Re: Fighting against performance: JDBC RDD badly distributed
Hi Saif,
Are you using JdbcRDD directly from Spark?
If yes, then the poor distribution could be due to the bound key you used.
See the JdbcRDD Scala doc at
https://spark.apache.org/docs/latest/api/scala/index.html