One more thing I feel for better maintability would be to create a dB view and then use the view in spark. This will avoid burying complicated SQL queries within application code. On 8 Dec 2015 05:55, "Wang, Ningjun (LNG-NPV)" <ningjun.w...@lexisnexis.com> wrote:
> This is a very helpful article. Thanks for the help. > > > > Ningjun > > > > *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] > *Sent:* Monday, December 07, 2015 12:42 PM > *To:* Wang, Ningjun (LNG-NPV) > *Cc:* user@spark.apache.org > *Subject:* Re: How to create dataframe from SQL Server SQL query > > > > Hi Ningjun, > > > > Haven't done this myself, saw your question and was curious about the > answer and found this article which you might find useful: > > > http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ > > > > According this article, you can pass in your SQL statement in the > "dbtable" mapping, ie, something like: > > > > val jdbcDF = sqlContext.read.format("jdbc") > > .options( > > Map("url" -> "jdbc:postgresql:dbserver", > > "dbtable" -> "(select docid, title, docText from > dbo.document where docid between 10 and 1000)" > > )).load > > > > -sujit > > > > On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) < > ningjun.w...@lexisnexis.com> wrote: > > How can I create a RDD from a SQL query against SQLServer database? Here > is the example of dataframe > > > > http://spark.apache.org/docs/latest/sql-programming-guide.html#overview > > > > > > *val* jdbcDF *=* sqlContext.read.format("jdbc").options( > > *Map*("url" -> "jdbc:postgresql:dbserver", > > "dbtable" -> "schema.tablename")).load() > > > > This code create dataframe from a table. How can I create dataframe from a > query, e.g. “select docid, title, docText from dbo.document where docid > between 10 and 1000”? > > > > Ningjun > > > > >