I mean current approach and skewness?

On Fri, Aug 14, 2015 at 8:54 AM, Srinath Perera <[email protected]> wrote:

> Can we use a combination of both?
>
> On Thu, Aug 13, 2015 at 8:46 PM, Supun Sethunga <[email protected]> wrote:
>
>> When a dataset is large, in general its said to be approximates to a
>> Normal Distribution. :)  True it Hypothetical, but the point they make is,
>> when the datasets are large, then properties of a distribution like
>> skewness, variance and etc. become closer to the properties Normal
>> Distribution in most cases..
>>
>> On Thu, Aug 13, 2015 at 11:07 AM, Nirmal Fernando <[email protected]>
>> wrote:
>>
>>> Hi Supun,
>>>
>>> Thanks for the reply.
>>>
>>> On Thu, Aug 13, 2015 at 8:09 PM, Supun Sethunga <[email protected]> wrote:
>>>
>>>> Hi Nirmal,
>>>>
>>>> IMO don't think we would be able to use skewness in this case. Skewness
>>>> says how symmetric the distribution is. For example, if we consider a
>>>> numerical/continuous feature (not categorical) which is Normally
>>>> Distributed, then the skewness would be 0. Also for a categorical (encoded)
>>>> feature having a systematic distribution, then again the skewness would be
>>>> 0.
>>>>
>>>
>>> What's the probability of you see a normal distribution of a real
>>> dataset? IMO it's very less and also since what we're doing here is a
>>> suggestion, do you see it as an issue?
>>>
>>>
>>>>
>>>> We did have this concern at the beginning as well, regarding how we
>>>> could determine whether a feature is categorical or Continuous. Usually
>>>> this is strictly dependent on the domain of the dataset (i.e. user have to
>>>> decide this with the knowledge about the data). That was the idea behind
>>>> letting user change the data type.. But since we needed a default option,
>>>> we had to go for the threshold thing, which was the olny option we could
>>>> come-up with. I did a bit of research on this too, but only to find no
>>>> other solution :(
>>>>
>>>> Thanks,
>>>> Supun
>>>>
>>>> On Thu, Aug 13, 2015 at 1:49 AM, Nirmal Fernando <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We have a feature in ML where we suggest a given data column of a
>>>>> dataset is categorical or numerical. Currently, how we determine this is 
>>>>> by
>>>>> using a threshold value (The maximum number of categories that can
>>>>> have in a non-string categorical feature. If exceeds, the feature
>>>>> will be treated as a numerical feature.). But this is not a
>>>>> successful measurement for most of the datasets.
>>>>>
>>>>> Can we use 'skewness' of a distribution as a measurement to determine
>>>>> this? Can we say, a column is numerical, if the modulus of the skewness of
>>>>> the distribution is less than a certain threshold (say 0.01) ?
>>>>>
>>>>> *References*:
>>>>>
>>>>> http://www.itrcweb.org/gsmc-1/Content/GW%20Stats/5%20Methods%20in%20indiv%20Topics/5%206%20Distributional%20Tests.htm
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Team Lead - WSO2 Machine Learner
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to