On Sun, Nov 30, 2014 at 1:10 PM, Peter Vandenabeele <pe...@vandenabeele.com> wrote:
> On spark 1.1.0 in Standalone mode, I am following > > > https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets > > to try to load a simple test JSON file (on my local filesystem, not in > hdfs). > The file is below and was validated with jsonlint.com: > > ➜ tmp cat test_4.json > {"foo": > [{ > "bar": { > "id": 31, > "name": "bar" > } > },{ > "tux": { > "id": 42, > "name": "tux" > } > }] > } > ➜ tmp wc test_4.json > 13 19 182 test_4.json > > I should have read the manual better (#rtfm): "jsonFile - loads data from a directory of JSON files where _each line of the files is a JSON object_." (emphasis mine) So, what works is: $ cat test_6.json # ==> Succes (count() => 3) {"foo":"bar"} {"foo":"tux"} {"foo":"ping"} and what fails is: $ cat test_7.json # ==> Fail (JsonMappingException) [ {"foo":"bar"}, {"foo":"tux"}, {"foo":"ping"} ] I got confused by the fact that test_6.json is _not_ valid JSON (but works for this) and test_7.json is a _valid_ JSON array (and does not work for this). I will see if I can contribute some note in the documentation. In any case, it might be better to not _name_ the file in the tutorial examples/src/main/resources/people.json because it is actually not a valid JSON file (but a file with a JSON object on each line) Maybe examples/src/main/resources/people.jsons is a better name (equivalent to 'xs' that is used as convention in Scala). Also, just showing an example of people.jsons could avoid future confusion. Thanks, Peter -- Peter Vandenabeele http://www.allthingsdata.io