Re: How to limit search range without using subquery when query SQL DB via JDBC?

2016-05-13 Thread Mich Talebzadeh
Well I don't know about postgres but you can limit the number of columns
abd rows fetched via JDBC at source rather than loading and filtering them
in Spark

val c = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM
sh.channels where ROWNUM <= 1)",
"user" -> _username,
"password" -> _password))

or in your case

"dbtable" -> "(SELECT COUNT(1) FROM FROM sh.channels where ROWNUM <=
1)",

c.show()

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 14 May 2016 at 04:56, Jyun-Fan Tsai  wrote:

> I try to load some rows from a big SQL table.  Here is my code:
>
> ===
> jdbcDF = sqlContext.read.format("jdbc").options(
>   url="jdbc:postgresql://...",
>   dbtable="mytable",
>   partitionColumn="t",
>   lowerBound=1451577600,
>   upperBound=1454256000,
>   numPartitions=1).load()
> print(jdbcDF.count())
> ===
>
> The code runs very slow because Spark tries to load whole table.
> I know there is a solution that uses subquery.  I can use:
>
> dbtable="(SELECT * FROM mytable WHERE t>=1451577600 AND t<= 1454256000)
> tmp".
> However, it's still slow because the subquery creates a temp table.
>
> I would like to know how can I specify where filters so I don't need
> to load the whole table?
>
> From spark source code I guess the filter in JDBCRelation is the
> solution I'm looking for.  However, I don't know how to create a
> filter and pass it to jdbc driver.
> ===
>
> https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
> ===
>
>
>
> --
> Thanks for help,
> Jyun-Fan Tsai
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


How to limit search range without using subquery when query SQL DB via JDBC?

2016-05-13 Thread Jyun-Fan Tsai
I try to load some rows from a big SQL table.  Here is my code:

===
jdbcDF = sqlContext.read.format("jdbc").options(
  url="jdbc:postgresql://...",
  dbtable="mytable",
  partitionColumn="t",
  lowerBound=1451577600,
  upperBound=1454256000,
  numPartitions=1).load()
print(jdbcDF.count())
===

The code runs very slow because Spark tries to load whole table.
I know there is a solution that uses subquery.  I can use:

dbtable="(SELECT * FROM mytable WHERE t>=1451577600 AND t<= 1454256000) tmp".
However, it's still slow because the subquery creates a temp table.

I would like to know how can I specify where filters so I don't need
to load the whole table?

>From spark source code I guess the filter in JDBCRelation is the
solution I'm looking for.  However, I don't know how to create a
filter and pass it to jdbc driver.
===
https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
===



-- 
Thanks for help,
Jyun-Fan Tsai

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org