RE: How to accelerate reading json file?
If you already know the schema, then you can run the read with the schema parameter like this: val path = "examples/src/main/resources/jsonfile" val jsonSchema = StructType( StructField("id",StringType,true) :: StructField("reference",LongType,true) :: StructField("details",detailsSchema, true) :: StructField("value",StringType,true) ::Nil) val people = sqlContext.read.schema(jsonSchema).json(path) If you have the schema defined as a separate small JSON file, then you can load it by running something like this line to load it directly: val jsonSchema = sqlContext.read.json(“path/to/schema”).schema Thanks, Ewan From: Gavin Yue [mailto:yue.yuany...@gmail.com] Sent: 06 January 2016 07:14 To: userSubject: How to accelerate reading json file? I am trying to read json files following the example: val path = "examples/src/main/resources/jsonfile" val people = sqlContext.read.json(path) I have 1 Tb size files in the path. It took 1.2 hours to finish the reading to infer the schema. But I already know the schema. Could I make this process short? Thanks a lot.
Re: How to accelerate reading json file?
Hi all I want to ask how exactly it differs while reading >1 tb file on standalone cluster vs yarn or mesos cluster ? On Wednesday 6 January 2016, Gavin Yuewrote: > I am trying to read json files following the example: > > val path = "examples/src/main/resources/jsonfile"val people = > sqlContext.read.json(path) > > I have 1 Tb size files in the path. It took 1.2 hours to finish the reading > to infer the schema. > > But I already know the schema. Could I make this process short? > > Thanks a lot. > > > > -- Regards, Vijay Gharge
Re: How to accelerate reading json file?
HI , You can try this sqlContext.read.format("json").option("samplingRatio","0.1").load("path") If it still takes time , feel free to experiment with the samplingRatio. Thanks, Vishnu On Wed, Jan 6, 2016 at 12:43 PM, Gavin Yuewrote: > I am trying to read json files following the example: > > val path = "examples/src/main/resources/jsonfile"val people = > sqlContext.read.json(path) > > I have 1 Tb size files in the path. It took 1.2 hours to finish the reading > to infer the schema. > > But I already know the schema. Could I make this process short? > > Thanks a lot. > > > >