Hi Ningjun, Haven't done this myself, saw your question and was curious about the answer and found this article which you might find useful: http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/
According this article, you can pass in your SQL statement in the "dbtable" mapping, ie, something like: val jdbcDF = sqlContext.read.format("jdbc") .options( Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "(select docid, title, docText from dbo.document where docid between 10 and 1000)" )).load -sujit On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) < ningjun.w...@lexisnexis.com> wrote: > How can I create a RDD from a SQL query against SQLServer database? Here > is the example of dataframe > > > > http://spark.apache.org/docs/latest/sql-programming-guide.html#overview > > > > > > *val* jdbcDF *=* sqlContext.read.format("jdbc").options( > > *Map*("url" -> "jdbc:postgresql:dbserver", > > "dbtable" -> "schema.tablename")).load() > > > > This code create dataframe from a table. How can I create dataframe from a > query, e.g. “select docid, title, docText from dbo.document where docid > between 10 and 1000”? > > > > Ningjun > > >