i've seen this done using mapPartitions() where each partition represents a single, multi-line json file. you can rip through each partition (json file) and parse the json doc as a whole.
this assumes you use sc.textFile("<path>/*.json") or equivalent to load in multiple files at once. each json file will be a partition. not sure if this satisfies your use case, but might be a good starting point. -chris On Mon, Jul 14, 2014 at 2:55 PM, SK <skrishna...@gmail.com> wrote: > Hi, > > I have a json file where the definition of each object spans multiple > lines. > An example of one object definition appears below. > > { > "name": "16287e9cdf", > "width": 500, > "height": 325, > "width": 1024, > "height": 665, > "obj": [ > { > "x": 395.08, > "y": 82.09, > "w": 185.48677, > "h": 185.48677, > "min": 50, > "max": 59, > "attr1": 2, > "attr2": 68, > "attr3": 8 > }, > { > "x": 519.1, > "y": 225.8, > "w": 170, > "h": 171, > "min": 20, > "max": 29, > "attr1": 7, > "attr2": 93, > "attr3": 10 > } > ] > } > > I used the following Spark code to parse the file. However, the parsing is > failing because I think it expects one Json object definition per line. I > can try to preprocess the input file to remove the new lines, but I would > like to know if it is possible to parse a Json object definition that spans > multiple lines, directly in Spark. > > val inp = sc.textFile(args(0)) > val res = inp.map(line => { parse(line) }) > .map(json => > { > implicit lazy val formats = > org.json4s.DefaultFormats > val image = (json \ "name").extract[String] > } > ) > > > Thanks for your help. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-Json-object-definition-spanning-multiple-lines-tp9659.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >