The doc for DataFrameReader#json(RDD[String]) method says

"Unless the schema is specified using schema function, this function goes
through the input once to determine the input schema."

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader

Why is this necessary? Why can't it create the dataframe at the same time
as it's determining the schema?

Thanks.

Reply via email to