Hi All, We have a feature in ML where we suggest a given data column of a dataset is categorical or numerical. Currently, how we determine this is by using a threshold value (The maximum number of categories that can have in a non-string categorical feature. If exceeds, the feature will be treated as a numerical feature.). But this is not a successful measurement for most of the datasets.
Can we use 'skewness' of a distribution as a measurement to determine this? Can we say, a column is numerical, if the modulus of the skewness of the distribution is less than a certain threshold (say 0.01) ? *References*: http://www.itrcweb.org/gsmc-1/Content/GW%20Stats/5%20Methods%20in%20indiv%20Topics/5%206%20Distributional%20Tests.htm -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
