[jira] [Updated] (SPARK-24423) Add a new option `query` for JDBC sources

Xiao Li (JIRA) Tue, 29 May 2018 23:41:56 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiao Li updated SPARK-24423:
----------------------------
    Description: 
Currently, our JDBC connector provides the option `dbtable` for users to 
specify the to-be-loaded JDBC source table. 
{code} 
 val jdbcDf = spark.read
   .format("jdbc")
   .option("*dbtable*", "dbName.tableName")
   .options(jdbcCredentials: Map)
   .load()
{code} 
  
 Normally, users do not fetch the whole JDBC table due to the poor 
performance/throughput of JDBC. Thus, they normally just fetch a small set of 
tables. For advanced users, they can pass a subquery as the option. 
  
{code} 
 val query = """ (select * from tableName limit 10) as tmp """
 val jdbcDf = spark.read
   .format("jdbc")
   .option("*dbtable*", query)
   .options(jdbcCredentials: Map)
   .load()
{code} 
  
 However, this is straightforward to end users. We should simply allow users to 
specify the query by a new option `query`. We will handle the complexity for 
them. 
  
{code} 
 val query = """select * from tableName limit 10"""
 val jdbcDf = spark.read
   .format("jdbc")
   .option("*{color:#ff0000}query{color}*", query)
   .options(jdbcCredentials: Map)
   .load()
{code} 
  
 Users are not allowed to specify query and dbtable at the same time. 

  was:
Currently, our JDBC connector provides the option `dbtable` for users to 
specify the to-be-loaded JDBC source table. 
 
val jdbcDf = spark.read
  .format("jdbc")
  .option("*dbtable*", "dbName.tableName")
  .options(jdbcCredentials: Map)
  .load()
 
Normally, users do not fetch the whole JDBC table due to the poor 
performance/throughput of JDBC. Thus, they normally just fetch a small set of 
tables. For advanced users, they can pass a subquery as the option. 
 
val query = """ (select * from tableName limit 10) as tmp """
val jdbcDf = spark.read
  .format("jdbc")
  .option("*dbtable*", query)
  .options(jdbcCredentials: Map)
  .load()
 
However, this is straightforward to end users. We should simply allow users to 
specify the query by a new option `query`. We will handle the complexity for 
them. 
 
val query = """select * from tableName limit 10"""
val jdbcDf = spark.read
  .format("jdbc")
  .option("*{color:#ff0000}query{color}*", query)
  .options(jdbcCredentials: Map)
  .load()
 
Users are not allowed to specify query and dbtable at the same time. 


> Add a new option `query` for JDBC sources
> -----------------------------------------
>
>                 Key: SPARK-24423
>                 URL: https://issues.apache.org/jira/browse/SPARK-24423
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Xiao Li
>            Priority: Major
>
> Currently, our JDBC connector provides the option `dbtable` for users to 
> specify the to-be-loaded JDBC source table. 
> {code} 
>  val jdbcDf = spark.read
>    .format("jdbc")
>    .option("*dbtable*", "dbName.tableName")
>    .options(jdbcCredentials: Map)
>    .load()
> {code} 
>   
>  Normally, users do not fetch the whole JDBC table due to the poor 
> performance/throughput of JDBC. Thus, they normally just fetch a small set of 
> tables. For advanced users, they can pass a subquery as the option. 
>   
> {code} 
>  val query = """ (select * from tableName limit 10) as tmp """
>  val jdbcDf = spark.read
>    .format("jdbc")
>    .option("*dbtable*", query)
>    .options(jdbcCredentials: Map)
>    .load()
> {code} 
>   
>  However, this is straightforward to end users. We should simply allow users 
> to specify the query by a new option `query`. We will handle the complexity 
> for them. 
>   
> {code} 
>  val query = """select * from tableName limit 10"""
>  val jdbcDf = spark.read
>    .format("jdbc")
>    .option("*{color:#ff0000}query{color}*", query)
>    .options(jdbcCredentials: Map)
>    .load()
> {code} 
>   
>  Users are not allowed to specify query and dbtable at the same time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24423) Add a new option `query` for JDBC sources

Reply via email to