On Sun, Nov 30, 2014 at 1:10 PM, Peter Vandenabeele <pe...@vandenabeele.com>
wrote:

> On spark 1.1.0 in Standalone mode, I am following
>
>
> https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets
>
> to try to load a simple test JSON file (on my local filesystem, not in
> hdfs).
> The file is below and was validated with jsonlint.com:
>
> ➜  tmp  cat test_4.json
> {"foo":
>     [{
>         "bar": {
>             "id": 31,
>             "name": "bar"
>         }
>     },{
>         "tux": {
>             "id": 42,
>             "name": "tux"
>         }
>     }]
> }
> ➜  tmp  wc test_4.json
>       13      19     182 test_4.json
>
>
I should have read the manual better (#rtfm):

  "jsonFile - loads data from a directory of JSON files where _each line of
the files is a JSON object_."  (emphasis mine)

So, what works is:

$ cat test_6.json   # ==> Succes  (count() => 3)
{"foo":"bar"}
{"foo":"tux"}
{"foo":"ping"}

and what fails is:

$   cat test_7.json   # ==> Fail (JsonMappingException)
[
  {"foo":"bar"},
  {"foo":"tux"},
  {"foo":"ping"}
]


I got confused by the fact that test_6.json is _not_ valid JSON (but works
for this)

and test_7.json is a _valid_ JSON array (and does not work for this).


I will see if I can contribute some note in the documentation.

In any case, it might be better to not _name_ the file in the tutorial

examples/src/main/resources/people.json

because it is actually not a valid JSON file (but a file with a JSON object
on each line)

Maybe

examples/src/main/resources/people.jsons

is a better name (equivalent to 'xs' that is used as convention in Scala).

Also, just showing an example of people.jsons could avoid future confusion.

Thanks,

Peter

-- 
Peter Vandenabeele
http://www.allthingsdata.io

Reply via email to