Yes it is possible. You need to use jsonfile method on SQL context and then
create a dataframe from the rdd. Then register it as a table. Should be 3
lines of code, thanks to spark.
You may see few YouTube video esp for unifying pipelines.
On 3 May 2015 19:02, Jai jai4l...@gmail.com wrote:
Hi,
Hi,
I am noob to spark and related technology.
i have JSON stored at same location on all worker clients spark cluster). I
am looking to load JSON data set on these clients and do SQL query, like
distributed SQL.
is it possible to achieve?
right now, master submits task to one node only.
Looking at SQLContext.scala (in master branch), jsonFile() returns
DataFrame directly:
def jsonFile(path: String, samplingRatio: Double): DataFrame =
FYI
On Sun, May 3, 2015 at 2:14 AM, ayan guha guha.a...@gmail.com wrote:
Yes it is possible. You need to use jsonfile method on SQL context
Note that each JSON object has to be on a single line in the files.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On