RE: How to accelerate reading json file?

2016-01-06 Thread Ewan Leith
If you already know the schema, then you can run the read with the schema 
parameter like this:


val path = "examples/src/main/resources/jsonfile"

val jsonSchema =  StructType(
StructField("id",StringType,true) ::
StructField("reference",LongType,true) ::
StructField("details",detailsSchema, true) ::
StructField("value",StringType,true) ::Nil)

val people = sqlContext.read.schema(jsonSchema).json(path)
If you have the schema defined as a separate small JSON file, then you can load 
it by running something like this line to load it directly:

val jsonSchema = sqlContext.read.json(“path/to/schema”).schema

Thanks,
Ewan

From: Gavin Yue [mailto:yue.yuany...@gmail.com]
Sent: 06 January 2016 07:14
To: user 
Subject: How to accelerate reading json file?

I am trying to read json files following the example:

val path = "examples/src/main/resources/jsonfile"

val people = sqlContext.read.json(path)

I have 1 Tb size files in the path.  It took 1.2 hours to finish the reading to 
infer the schema.

But I already know the schema. Could I make this process short?

Thanks a lot.





Re: How to accelerate reading json file?

2016-01-06 Thread Vijay Gharge
Hi all

I want to ask how exactly it differs while reading >1 tb file on standalone
cluster vs yarn or mesos cluster ?

On Wednesday 6 January 2016, Gavin Yue  wrote:

> I am trying to read json files following the example:
>
> val path = "examples/src/main/resources/jsonfile"val people = 
> sqlContext.read.json(path)
>
> I have 1 Tb size files in the path.  It took 1.2 hours to finish the reading 
> to infer the schema.
>
> But I already know the schema. Could I make this process short?
>
> Thanks a lot.
>
>
>
>

-- 
Regards,
Vijay Gharge


Re: How to accelerate reading json file?

2016-01-05 Thread VISHNU SUBRAMANIAN
HI ,

You can try this

sqlContext.read.format("json").option("samplingRatio","0.1").load("path")

If it still takes time , feel free to experiment with the samplingRatio.

Thanks,
Vishnu

On Wed, Jan 6, 2016 at 12:43 PM, Gavin Yue  wrote:

> I am trying to read json files following the example:
>
> val path = "examples/src/main/resources/jsonfile"val people = 
> sqlContext.read.json(path)
>
> I have 1 Tb size files in the path.  It took 1.2 hours to finish the reading 
> to infer the schema.
>
> But I already know the schema. Could I make this process short?
>
> Thanks a lot.
>
>
>
>