Hi Supun, Thanks for the reply.
On Thu, Aug 13, 2015 at 8:09 PM, Supun Sethunga <[email protected]> wrote: > Hi Nirmal, > > IMO don't think we would be able to use skewness in this case. Skewness > says how symmetric the distribution is. For example, if we consider a > numerical/continuous feature (not categorical) which is Normally > Distributed, then the skewness would be 0. Also for a categorical (encoded) > feature having a systematic distribution, then again the skewness would be > 0. > What's the probability of you see a normal distribution of a real dataset? IMO it's very less and also since what we're doing here is a suggestion, do you see it as an issue? > > We did have this concern at the beginning as well, regarding how we could > determine whether a feature is categorical or Continuous. Usually this is > strictly dependent on the domain of the dataset (i.e. user have to decide > this with the knowledge about the data). That was the idea behind letting > user change the data type.. But since we needed a default option, we had to > go for the threshold thing, which was the olny option we could come-up > with. I did a bit of research on this too, but only to find no other > solution :( > > Thanks, > Supun > > On Thu, Aug 13, 2015 at 1:49 AM, Nirmal Fernando <[email protected]> wrote: > >> Hi All, >> >> We have a feature in ML where we suggest a given data column of a dataset >> is categorical or numerical. Currently, how we determine this is by using a >> threshold value (The maximum number of categories that can have in a >> non-string categorical feature. If exceeds, the feature will be treated >> as a numerical feature.). But this is not a successful measurement for >> most of the datasets. >> >> Can we use 'skewness' of a distribution as a measurement to determine >> this? Can we say, a column is numerical, if the modulus of the skewness of >> the distribution is less than a certain threshold (say 0.01) ? >> >> *References*: >> >> http://www.itrcweb.org/gsmc-1/Content/GW%20Stats/5%20Methods%20in%20indiv%20Topics/5%206%20Distributional%20Tests.htm >> >> -- >> >> Thanks & regards, >> Nirmal >> >> Team Lead - WSO2 Machine Learner >> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >> Mobile: +94715779733 >> Blog: http://nirmalfdo.blogspot.com/ >> >> >> > > > -- > *Supun Sethunga* > Software Engineer > WSO2, Inc. > http://wso2.com/ > lean | enterprise | middleware > Mobile : +94 716546324 > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
