Hi All,

We have a feature in ML where we suggest a given data column of a dataset
is categorical or numerical. Currently, how we determine this is by using a
threshold value (The maximum number of categories that can have in a
non-string categorical feature. If exceeds, the feature will be treated as
a numerical feature.). But this is not a successful measurement for most of
the datasets.

Can we use 'skewness' of a distribution as a measurement to determine this?
Can we say, a column is numerical, if the modulus of the skewness of the
distribution is less than a certain threshold (say 0.01) ?

*References*:
http://www.itrcweb.org/gsmc-1/Content/GW%20Stats/5%20Methods%20in%20indiv%20Topics/5%206%20Distributional%20Tests.htm

-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to