Wanted to take something like this https://github.com/fitzscott/AirQuality/blob/master/HiveDataTypeGuesser.java and create a Hive UDAF to create an aggregate function that returns a data type guess. Am I inventing a wheel? Does Spark have something like this already built-in? Would be very useful for new wide datasets to explore data. Would be helpful for ML too, e.g. to decide categorical vs numerical variables.
Ruslan