Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread ayan guha
Yes it is possible. You need to use jsonfile method on SQL context and then create a dataframe from the rdd. Then register it as a table. Should be 3 lines of code, thanks to spark. You may see few YouTube video esp for unifying pipelines. On 3 May 2015 19:02, Jai jai4l...@gmail.com wrote: Hi,

Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Jai
Hi, I am noob to spark and related technology. i have JSON stored at same location on all worker clients spark cluster). I am looking to load JSON data set on these clients and do SQL query, like distributed SQL. is it possible to achieve? right now, master submits task to one node only.

Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Ted Yu
Looking at SQLContext.scala (in master branch), jsonFile() returns DataFrame directly: def jsonFile(path: String, samplingRatio: Double): DataFrame = FYI On Sun, May 3, 2015 at 2:14 AM, ayan guha guha.a...@gmail.com wrote: Yes it is possible. You need to use jsonfile method on SQL context

Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Dean Wampler
Note that each JSON object has to be on a single line in the files. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On