Hi Nirmal, IMO don't think we would be able to use skewness in this case. Skewness says how symmetric the distribution is. For example, if we consider a numerical/continuous feature (not categorical) which is Normally Distributed, then the skewness would be 0. Also for a categorical (encoded) feature having a systematic distribution, then again the skewness would be 0.
We did have this concern at the beginning as well, regarding how we could determine whether a feature is categorical or Continuous. Usually this is strictly dependent on the domain of the dataset (i.e. user have to decide this with the knowledge about the data). That was the idea behind letting user change the data type.. But since we needed a default option, we had to go for the threshold thing, which was the olny option we could come-up with. I did a bit of research on this too, but only to find no other solution :( Thanks, Supun On Thu, Aug 13, 2015 at 1:49 AM, Nirmal Fernando <[email protected]> wrote: > Hi All, > > We have a feature in ML where we suggest a given data column of a dataset > is categorical or numerical. Currently, how we determine this is by using a > threshold value (The maximum number of categories that can have in a > non-string categorical feature. If exceeds, the feature will be treated > as a numerical feature.). But this is not a successful measurement for > most of the datasets. > > Can we use 'skewness' of a distribution as a measurement to determine > this? Can we say, a column is numerical, if the modulus of the skewness of > the distribution is less than a certain threshold (say 0.01) ? > > *References*: > > http://www.itrcweb.org/gsmc-1/Content/GW%20Stats/5%20Methods%20in%20indiv%20Topics/5%206%20Distributional%20Tests.htm > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
