Let's say I have 2 formats for json objects in the same file schema1 = { "location": "12345 My Lane" } schema2 = { "location":{"houseAddres":"1234 My Lane"} }
>From my tests, it looks like the current inferSchema() function will end up with only StructField("location", StringType). What would be the best (and efficient) way to process both types of records? In other words, I'd like to merge the two schemas together so that a user can query both (location = "12345 My Lane" or location.houseAddress = "1234 MyLane") Really, I'm not aiming for loading from files, but trying to derive a user-friendly way to evolve schemas over time. I'm actually combining the schemas (Set[(String, DataType)]) in an Accumulo table and, upon a user wanting to mine some data, presenting the schemas to them for some period of time of interest. I want to be able to derive a good technique to say "For the time period given, you have two ways of querying an address..." but allow them to specify both ways.