It is a plain Java IO error. Your line is too long. You should alter your
JSON schema, so each line is a small JSON object.

Please do not concatenate all the object into an array, then write the
array in one line. You will have difficulty handling your super large JSON
array in Spark anyway.

Because one array is one object, it cannot be split into multiple partition.


On Tue, Oct 18, 2016 at 3:44 PM Chetan Khatri <ckhatriman...@gmail.com>
wrote:

> Hello Community members,
>
> I am getting error while reading large JSON file in spark,
>
> *Code:*
>
> val landingVisitor =
> sqlContext.read.json("s3n://hist-ngdp/lvisitor/lvisitor-01-aug.json")
>
> *Error:*
>
> 16/10/18 07:30:30 ERROR Executor: Exception in task 8.0 in stage 0.0 (TID
> 8)
> java.io.IOException: Too many bytes before newline: 2147483648
> at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:249)
> at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
> at
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:135)
> at
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
>
> What would be resolution for the same ?
>
> Thanks in Advance !
>
>
> --
> Yours Aye,
> Chetan Khatri.
>
> --


Thanks,
David S.

Reply via email to