Let's say I have 2 formats for json objects in the same file
schema1 = { "location": "12345 My Lane" }
schema2 = { "location":{"houseAddres":"1234 My Lane"} }

>From my tests, it looks like the current inferSchema() function will end up
with only StructField("location", StringType).

What would be the best (and efficient) way to process both types of
records? In other words, I'd like to merge the two schemas together so that
a user can query both (location = "12345 My Lane" or  location.houseAddress
= "1234 MyLane")

Really, I'm not aiming for loading from files, but trying to derive a
user-friendly way to evolve schemas over time. I'm actually combining the
schemas (Set[(String, DataType)]) in an Accumulo table and, upon a user
wanting to mine some data, presenting the schemas to them for some period
of time of interest. I want to be able to derive a good technique to say
"For the time period given, you have two ways of querying an address..."
but allow them to specify both ways.

Reply via email to