[jira] [Issue Comment Deleted] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

Charles Drotar (JIRA) Mon, 08 Feb 2016 03:30:12 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Charles Drotar updated SPARK-13156:
-----------------------------------
    Comment: was deleted

(was: Thanks Sean. The driver inhibiting the concurrent connections was the 
issue. Apparently the Teradata driver does not support concurrent connections 
and instead suggests creating different sessions for each query. I don't think 
this is truly an issue so I will close out the JIRA.)

> JDBC using multiple partitions creates additional tasks but only executes on 
> one
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-13156
>                 URL: https://issues.apache.org/jira/browse/SPARK-13156
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.5.0
>         Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
>            Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it 
> runs it creates a task on each executor for every partition. The problem is 
> that all of the tasks except for one complete within a couple seconds and the 
> final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT  id MOD $numPartitions%d AS modulo, id FROM 
> db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = 
> sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually 
> querying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

Reply via email to