Are you sure that your JSON file has the right format? spark.read.json(...) expects a file where *each line is a json object*.
My wild guess is that val hdf=spark.read.json("/user/tmp/hugedatafile") hdf.show(2) or hdf.take(1) gives OOM tries to fetch all the data into the driver. Can you reformat your input file and try again? Best, Anastasios On Tue, Jun 5, 2018 at 8:39 PM, raksja <shanmugkr...@gmail.com> wrote: > I have a json file which is a continuous array of objects of similar type > [{},{}...] for about 1.5GB uncompressed and 33MB gzip compressed. > > This is uploaded hugedatafile to hdfs and this is not a JSONL file, its a > whole regular json file. > > > [{"id":"1","entityMetadata":{"lastChange":"2018-05-11 > 01:09:18.0","createdDateTime":"2018-05-11 > 01:09:18.0","modifiedDateTime":"2018-05-11 > 01:09:18.0"},"type":"11"},{"id":"2","entityMetadata":{" > lastChange":"2018-05-11 > 01:09:18.0","createdDateTime":"2018-05-11 > 01:09:18.0","modifiedDateTime":"2018-05-11 > 01:09:18.0"},"type":"11"},{"id":"3","entityMetadata":{" > lastChange":"2018-05-11 > 01:09:18.0","createdDateTime":"2018-05-11 > 01:09:18.0","modifiedDateTime":"2018-05-11 > 01:09:18.0"},"type":"11"}..................] > > > I get OOM on executors whenever i try to load this into spark. > > Try 1 > val hdf=spark.read.json("/user/tmp/hugedatafile") > hdf.show(2) or hdf.take(1) gives OOM > > Try 2 > Took a small sampledatafile and got schema to avoid schema infering > val sampleSchema=spark.read.json("/user/tmp/sampledatafile").schema > val hdf=spark.read.schema(sampleSchema).json("/user/tmp/hugedatafile") > hdf.show(2) or hdf.take(1) stuck for 1.5 hrs and gives OOM > > Try 3 > Repartition it after before performing action > gives OOM > > Try 4 > Read about the https://issues.apache.org/jira/browse/SPARK-20980 > completely > val hdf = spark.read.option("multiLine", > true)..schema(sampleSchema).json("/user/tmp/hugedatafile") > hdf.show(1) or hdf.take(1) gives OOM > > > Can any one help me here? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- -- Anastasios Zouzias <a...@zurich.ibm.com>