Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-20 Thread Steve Loughran
> On 19 Oct 2016, at 21:46, Jakob Odersky wrote: > > Another reason I could imagine is that files are often read from HDFS, > which by default uses line terminators to separate records. > > It is possible to implement your own hdfs delimiter finder, however > for arbitrary

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-19 Thread Jakob Odersky
Another reason I could imagine is that files are often read from HDFS, which by default uses line terminators to separate records. It is possible to implement your own hdfs delimiter finder, however for arbitrary json data, finding that delimiter would require stateful parsing of the file and

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-18 Thread Hyukjin Kwon
Regarding his recent PR[1], I guess he meant multiple line json. As far as I know, single line json also conplies the standard. I left a comment with RFC in the PR but please let me know if I am wrong at any point. Thanks! [1]https://github.com/apache/spark/pull/15511 On 19 Oct 2016 7:00 a.m.,

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-18 Thread Daniel Barclay
Koert, Koert Kuipers wrote: A single json object would mean for most parsers it needs to fit in memory when reading or writing Note that codlife didn't seem to being asking about /single-object/ JSON files, but about /standard-format/ JSON files. On Oct 15, 2016 11:09, "codlife"

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-16 Thread Koert Kuipers
A single json object would mean for most parsers it needs to fit in memory when reading or writing On Oct 15, 2016 11:09, "codlife" <1004910...@qq.com> wrote: > Hi: >I'm doubt about the design of spark.read.json, why the json file is not > a standard json file, who can tell me the internal

Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-15 Thread codlife
Hi: I'm doubt about the design of spark.read.json, why the json file is not a standard json file, who can tell me the internal reason. Any advice is appreciated. -- View this message in context: