Re: Spark SQL Teradata load is very slow

Shyam P Fri, 03 May 2019 06:57:43 -0700

Asmath,
Why upperBound is set to 300 ? how many cores you have ?
  check how data is distributed in TeraData DB table.
SELECT  distinct( itm_bloon_seq_no  ), count(*) as cc  FROM TABLE   order
by  itm_bloon_seq_no  desc;


Is this column "itm_bloon_seq_no" already in table or you derived at spark
code side ?

Thanks,
Shyam


On Thu, May 2, 2019 at 11:30 PM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi,
>
> I have teradata table who has more than 2.5 billion records and data size
> is around 600 GB. I am not able to pull efficiently using spark SQL and it
> is been running for more than 11 hours. here is my code.
>
>       val df2 = sparkSession.read.format("jdbc")
> .option("url", "jdbc:teradata://PROD/DATABASE=XXXX101")
> .option("user", "HDFS_TD")
> .option("password", "CCCCC")
> .option("dbtable", "XXXX")
> .option("numPartitions", partitions)
> .option("driver","com.teradata.jdbc.TeraDriver")
> .option("partitionColumn", "itm_bloon_seq_no")
> .option("lowerBound", config.getInt("lowerBound"))
> .option("upperBound", config.getInt("upperBound"))
>
> Lower bound is 0 and upperbound is 300. Spark is using multiple executors
> but most of the executors are running fast and few executors are taking
> more time may be due to shuffling or something else.
>
> I also tried repartition on column but no luck. is there a better way to
> load this fast?
>
> Table in teradata is view but not the table.
>
> Thanks,
> Asmath
>
>

Re: Spark SQL Teradata load is very slow

Reply via email to