On 16 Oct 2017, at 16:22, Silvio Fiorito wrote:
> [...] then just infer the schema from a single file and reuse it when loading
> the whole data set:
Well, that is a possibility indeed.
Thanks,
Jeroen
If you’re confident the schema of all files is consistent, then just infer the
schema from a single file and reuse it when loading the whole data set:
val schema = spark.read.json(“/path/to/single/file.json”).schema
val wholeDataSet = spark.read.schema(schema).json(“/path/to/whole/datasets”)
Hello Spark users,
Does anyone know if there is a way to generate the Scala code for a complex
structure just from the output of dataframe.printSchema?
I have to analyse a significant volume of data and want to explicitly set the
schema(s) to avoid having to read my (compressed) JSON files