Hi all, We really like the ability to infer a schema from JSON contained in an RDD, but when we're using Spark Streaming on small batches of data, we sometimes find that Spark infers a more specific type than it should use, for example if the json in that small batch only contains integer values for a String field, it'll class the field as an Integer type on one Streaming batch, then a String on the next one.
Instead, we'd rather match every value as a String type, then handle any casting to a desired type later in the process. I don't think there's currently any simple way to avoid this that I can see, but we could add the functionality in the JacksonParser.scala file, probably in convertField. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala Does anyone know an easier and cleaner way to do this? Thanks, Ewan