Re: Spark SQL Teradata load is very slow

2019-05-03 Thread Shyam P
Asmath,
Why upperBound is set to 300 ? how many cores you have ?
  check how data is distributed in TeraData DB table.
SELECT  distinct( itm_bloon_seq_no  ), count(*) as cc  FROM TABLE   order
by  itm_bloon_seq_no  desc;

Is this column "itm_bloon_seq_no" already in table or you derived at spark
code side ?

Thanks,
Shyam


On Thu, May 2, 2019 at 11:30 PM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi,
>
> I have teradata table who has more than 2.5 billion records and data size
> is around 600 GB. I am not able to pull efficiently using spark SQL and it
> is been running for more than 11 hours. here is my code.
>
>   val df2 = sparkSession.read.format("jdbc")
> .option("url", "jdbc:teradata://PROD/DATABASE=101")
> .option("user", "HDFS_TD")
> .option("password", "C")
> .option("dbtable", "")
> .option("numPartitions", partitions)
> .option("driver","com.teradata.jdbc.TeraDriver")
> .option("partitionColumn", "itm_bloon_seq_no")
> .option("lowerBound", config.getInt("lowerBound"))
> .option("upperBound", config.getInt("upperBound"))
>
> Lower bound is 0 and upperbound is 300. Spark is using multiple executors
> but most of the executors are running fast and few executors are taking
> more time may be due to shuffling or something else.
>
> I also tried repartition on column but no luck. is there a better way to
> load this fast?
>
> Table in teradata is view but not the table.
>
> Thanks,
> Asmath
>
>


Spark SQL Teradata load is very slow

2019-05-02 Thread KhajaAsmath Mohammed
Hi,

I have teradata table who has more than 2.5 billion records and data size
is around 600 GB. I am not able to pull efficiently using spark SQL and it
is been running for more than 11 hours. here is my code.

  val df2 = sparkSession.read.format("jdbc")
.option("url", "jdbc:teradata://PROD/DATABASE=101")
.option("user", "HDFS_TD")
.option("password", "C")
.option("dbtable", "")
.option("numPartitions", partitions)
.option("driver","com.teradata.jdbc.TeraDriver")
.option("partitionColumn", "itm_bloon_seq_no")
.option("lowerBound", config.getInt("lowerBound"))
.option("upperBound", config.getInt("upperBound"))

Lower bound is 0 and upperbound is 300. Spark is using multiple executors
but most of the executors are running fast and few executors are taking
more time may be due to shuffling or something else.

I also tried repartition on column but no luck. is there a better way to
load this fast?

Table in teradata is view but not the table.

Thanks,
Asmath


Re: Spark on teradata?

2015-01-08 Thread Reynold Xin
Depending on your use cases. If the use case is to extract small amount of
data out of teradata, then you can use the JdbcRDD and soon a jdbc input
source based on the new Spark SQL external data source API.



On Wed, Jan 7, 2015 at 7:14 AM, gen tang gen.tan...@gmail.com wrote:

 Hi,

 I have a stupid question:
 Is it possible to use spark on Teradata data warehouse, please? I read
 some news on internet which say yes. However, I didn't find any example
 about this issue

 Thanks in advance.

 Cheers
 Gen




Re: Spark on teradata?

2015-01-08 Thread gen tang
Thanks a lot for your reply.
In fact, I need to work on almost all the data in teradata (~100T). So, I
don't think that jdbcRDD is a good choice.

Cheers
Gen


On Thu, Jan 8, 2015 at 7:39 PM, Reynold Xin r...@databricks.com wrote:

 Depending on your use cases. If the use case is to extract small amount of
 data out of teradata, then you can use the JdbcRDD and soon a jdbc input
 source based on the new Spark SQL external data source API.



 On Wed, Jan 7, 2015 at 7:14 AM, gen tang gen.tan...@gmail.com wrote:

 Hi,

 I have a stupid question:
 Is it possible to use spark on Teradata data warehouse, please? I read
 some news on internet which say yes. However, I didn't find any example
 about this issue

 Thanks in advance.

 Cheers
 Gen





Re: Spark on teradata?

2015-01-08 Thread Evan R. Sparks
Have you taken a look at the TeradataDBInputFormat? Spark is compatible
with arbitrary hadoop input formats - so this might work for you:
http://developer.teradata.com/extensibility/articles/hadoop-mapreduce-connector-to-teradata-edw

On Thu, Jan 8, 2015 at 10:53 AM, gen tang gen.tan...@gmail.com wrote:

 Thanks a lot for your reply.
 In fact, I need to work on almost all the data in teradata (~100T). So, I
 don't think that jdbcRDD is a good choice.

 Cheers
 Gen


 On Thu, Jan 8, 2015 at 7:39 PM, Reynold Xin r...@databricks.com wrote:

 Depending on your use cases. If the use case is to extract small amount
 of data out of teradata, then you can use the JdbcRDD and soon a jdbc input
 source based on the new Spark SQL external data source API.



 On Wed, Jan 7, 2015 at 7:14 AM, gen tang gen.tan...@gmail.com wrote:

 Hi,

 I have a stupid question:
 Is it possible to use spark on Teradata data warehouse, please? I read
 some news on internet which say yes. However, I didn't find any example
 about this issue

 Thanks in advance.

 Cheers
 Gen






Spark on teradata?

2015-01-07 Thread gen tang
Hi,

I have a stupid question:
Is it possible to use spark on Teradata data warehouse, please? I read some
news on internet which say yes. However, I didn't find any example about
this issue

Thanks in advance.

Cheers
Gen