Hi Supun,

Thanks for the reply.

On Thu, Aug 13, 2015 at 8:09 PM, Supun Sethunga <[email protected]> wrote:

> Hi Nirmal,
>
> IMO don't think we would be able to use skewness in this case. Skewness
> says how symmetric the distribution is. For example, if we consider a
> numerical/continuous feature (not categorical) which is Normally
> Distributed, then the skewness would be 0. Also for a categorical (encoded)
> feature having a systematic distribution, then again the skewness would be
> 0.
>

What's the probability of you see a normal distribution of a real dataset?
IMO it's very less and also since what we're doing here is a suggestion, do
you see it as an issue?


>
> We did have this concern at the beginning as well, regarding how we could
> determine whether a feature is categorical or Continuous. Usually this is
> strictly dependent on the domain of the dataset (i.e. user have to decide
> this with the knowledge about the data). That was the idea behind letting
> user change the data type.. But since we needed a default option, we had to
> go for the threshold thing, which was the olny option we could come-up
> with. I did a bit of research on this too, but only to find no other
> solution :(
>
> Thanks,
> Supun
>
> On Thu, Aug 13, 2015 at 1:49 AM, Nirmal Fernando <[email protected]> wrote:
>
>> Hi All,
>>
>> We have a feature in ML where we suggest a given data column of a dataset
>> is categorical or numerical. Currently, how we determine this is by using a
>> threshold value (The maximum number of categories that can have in a
>> non-string categorical feature. If exceeds, the feature will be treated
>> as a numerical feature.). But this is not a successful measurement for
>> most of the datasets.
>>
>> Can we use 'skewness' of a distribution as a measurement to determine
>> this? Can we say, a column is numerical, if the modulus of the skewness of
>> the distribution is less than a certain threshold (say 0.01) ?
>>
>> *References*:
>>
>> http://www.itrcweb.org/gsmc-1/Content/GW%20Stats/5%20Methods%20in%20indiv%20Topics/5%206%20Distributional%20Tests.htm
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Team Lead - WSO2 Machine Learner
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>
>
> --
> *Supun Sethunga*
> Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to