Asmath, Why upperBound is set to 300 ? how many cores you have ? check how data is distributed in TeraData DB table. SELECT distinct( itm_bloon_seq_no ), count(*) as cc FROM TABLE order by itm_bloon_seq_no desc;
Is this column "itm_bloon_seq_no" already in table or you derived at spark code side ? Thanks, Shyam On Thu, May 2, 2019 at 11:30 PM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I have teradata table who has more than 2.5 billion records and data size > is around 600 GB. I am not able to pull efficiently using spark SQL and it > is been running for more than 11 hours. here is my code. > > val df2 = sparkSession.read.format("jdbc") > .option("url", "jdbc:teradata://PROD/DATABASE=XXXX101") > .option("user", "HDFS_TD") > .option("password", "CCCCC") > .option("dbtable", "XXXX") > .option("numPartitions", partitions) > .option("driver","com.teradata.jdbc.TeraDriver") > .option("partitionColumn", "itm_bloon_seq_no") > .option("lowerBound", config.getInt("lowerBound")) > .option("upperBound", config.getInt("upperBound")) > > Lower bound is 0 and upperbound is 300. Spark is using multiple executors > but most of the executors are running fast and few executors are taking > more time may be due to shuffling or something else. > > I also tried repartition on column but no luck. is there a better way to > load this fast? > > Table in teradata is view but not the table. > > Thanks, > Asmath > >