Asmath,
Why upperBound is set to 300 ? how many cores you have ?
check how data is distributed in TeraData DB table.
SELECT distinct( itm_bloon_seq_no ), count(*) as cc FROM TABLE order
by itm_bloon_seq_no desc;
Is this column "itm_bloon_seq_no" already in table or you derived at spark
code side ?
Thanks,
Shyam
On Thu, May 2, 2019 at 11:30 PM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> I have teradata table who has more than 2.5 billion records and data size
> is around 600 GB. I am not able to pull efficiently using spark SQL and it
> is been running for more than 11 hours. here is my code.
>
> val df2 = sparkSession.read.format("jdbc")
> .option("url", "jdbc:teradata://PROD/DATABASE=101")
> .option("user", "HDFS_TD")
> .option("password", "C")
> .option("dbtable", "")
> .option("numPartitions", partitions)
> .option("driver","com.teradata.jdbc.TeraDriver")
> .option("partitionColumn", "itm_bloon_seq_no")
> .option("lowerBound", config.getInt("lowerBound"))
> .option("upperBound", config.getInt("upperBound"))
>
> Lower bound is 0 and upperbound is 300. Spark is using multiple executors
> but most of the executors are running fast and few executors are taking
> more time may be due to shuffling or something else.
>
> I also tried repartition on column but no luck. is there a better way to
> load this fast?
>
> Table in teradata is view but not the table.
>
> Thanks,
> Asmath
>
>