Re: Spark data type guesser UDAF

2015-09-21 Thread Ruslan Dautkhanov
Does it deserve to be a JIRA in Spark / Spark MLLib? How do you guys normally determine data types? Frameworks like h2o automatically determine data type scanning a sample of data, or whole dataset. So then one can decide e.g. if a variable should be a categorical variable or numerical. Another

Spark data type guesser UDAF

2015-09-17 Thread Ruslan Dautkhanov
Wanted to take something like this https://github.com/fitzscott/AirQuality/blob/master/HiveDataTypeGuesser.java and create a Hive UDAF to create an aggregate function that returns a data type guess. Am I inventing a wheel? Does Spark have something like this already built-in? Would be very useful