One more thing I feel for better maintability would be to create a dB view
and then use the view in spark. This will avoid burying complicated SQL
queries within application code.
On 8 Dec 2015 05:55, "Wang, Ningjun (LNG-NPV)" <ningjun.w...@lexisnexis.com>
wrote:

> This is a very helpful article. Thanks for the help.
>
>
>
> Ningjun
>
>
>
> *From:* Sujit Pal [mailto:sujitatgt...@gmail.com]
> *Sent:* Monday, December 07, 2015 12:42 PM
> *To:* Wang, Ningjun (LNG-NPV)
> *Cc:* user@spark.apache.org
> *Subject:* Re: How to create dataframe from SQL Server SQL query
>
>
>
> Hi Ningjun,
>
>
>
> Haven't done this myself, saw your question and was curious about the
> answer and found this article which you might find useful:
>
>
> http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/
>
>
>
> According this article, you can pass in your SQL statement in the
> "dbtable" mapping, ie, something like:
>
>
>
> val jdbcDF = sqlContext.read.format("jdbc")
>
>     .options(
>
>         Map("url" -> "jdbc:postgresql:dbserver",
>
>                 "dbtable" -> "(select docid, title, docText from
> dbo.document where docid between 10 and 1000)"
>
> )).load
>
>
>
> -sujit
>
>
>
> On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) <
> ningjun.w...@lexisnexis.com> wrote:
>
> How can I create a RDD from a SQL query against SQLServer database? Here
> is the example of dataframe
>
>
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#overview
>
>
>
>
>
> *val* jdbcDF *=* sqlContext.read.format("jdbc").options(
>
>   *Map*("url" -> "jdbc:postgresql:dbserver",
>
>   "dbtable" -> "schema.tablename")).load()
>
>
>
> This code create dataframe from a table. How can I create dataframe from a
> query, e.g. “select docid, title, docText from dbo.document where docid
> between 10 and 1000”?
>
>
>
> Ningjun
>
>
>
>
>

Reply via email to