i've seen this done using mapPartitions() where each partition represents a
single, multi-line json file.  you can rip through each partition (json
file) and parse the json doc as a whole.

this assumes you use sc.textFile("<path>/*.json") or equivalent to load in
multiple files at once.  each json file will be a partition.

not sure if this satisfies your use case, but might be a good starting
point.

-chris


On Mon, Jul 14, 2014 at 2:55 PM, SK <skrishna...@gmail.com> wrote:

> Hi,
>
> I have a json file where the definition of each object spans multiple
> lines.
> An example of one object definition appears below.
>
>  {
>     "name": "16287e9cdf",
>     "width": 500,
>     "height": 325,
>     "width": 1024,
>     "height": 665,
>     "obj": [
>       {
>         "x": 395.08,
>         "y": 82.09,
>         "w": 185.48677,
>         "h": 185.48677,
>         "min": 50,
>         "max": 59,
>         "attr1": 2,
>         "attr2": 68,
>         "attr3": 8
>       },
>       {
>         "x": 519.1,
>         "y": 225.8,
>         "w": 170,
>         "h": 171,
>         "min": 20,
>         "max": 29,
>         "attr1": 7,
>         "attr2": 93,
>         "attr3": 10
>       }
>    ]
> }
>
> I used the following Spark code to parse the file. However, the parsing is
> failing because I think it expects one Json object definition per line. I
> can try to preprocess the input file  to remove the new lines, but I would
> like to know if it is possible to parse a Json object definition that spans
> multiple lines, directly in Spark.
>
> val inp = sc.textFile(args(0))
> val res = inp.map(line => { parse(line) })
>                    .map(json =>
>                       {
>                          implicit lazy val formats =
> org.json4s.DefaultFormats
>                          val image = (json \ "name").extract[String]
>                       }
>                     )
>
>
> Thanks for  your help.
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-Json-object-definition-spanning-multiple-lines-tp9659.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to