Hi all,

To add to what Supun said, yes, the normal (or gaussian) distribution is
considered to be a common naturally occuring phenomena. There are many ML
techniques that assumes gauss distribution and applies really well to the
real world problems. For example, Gaussian processes assumes Gaussian noise
in the dataset.

And also, I don't quite fathom the reason to use a categorical threshold.
Why do we need to use a threshold? If the user specifies afield as
categorical, shouldn't we convert it to a categorical variable without
leveraging a threshold?
On Aug 14, 2015 2:09 AM, "Seshika Fernando" <sesh...@wso2.com> wrote:

> In addition, there are lots of datasets in economics, stocks, physics that
> are normally or approximate normally distributed, which will be used for
> predictive modelling
> On 13 Aug 2015 20:46, "Supun Sethunga" <sup...@wso2.com> wrote:
>
>> When a dataset is large, in general its said to be approximates to a
>> Normal Distribution. :)  True it Hypothetical, but the point they make is,
>> when the datasets are large, then properties of a distribution like
>> skewness, variance and etc. become closer to the properties Normal
>> Distribution in most cases..
>>
>> On Thu, Aug 13, 2015 at 11:07 AM, Nirmal Fernando <nir...@wso2.com>
>> wrote:
>>
>>> Hi Supun,
>>>
>>> Thanks for the reply.
>>>
>>> On Thu, Aug 13, 2015 at 8:09 PM, Supun Sethunga <sup...@wso2.com> wrote:
>>>
>>>> Hi Nirmal,
>>>>
>>>> IMO don't think we would be able to use skewness in this case. Skewness
>>>> says how symmetric the distribution is. For example, if we consider a
>>>> numerical/continuous feature (not categorical) which is Normally
>>>> Distributed, then the skewness would be 0. Also for a categorical (encoded)
>>>> feature having a systematic distribution, then again the skewness would be
>>>> 0.
>>>>
>>>
>>> What's the probability of you see a normal distribution of a real
>>> dataset? IMO it's very less and also since what we're doing here is a
>>> suggestion, do you see it as an issue?
>>>
>>>
>>>>
>>>> We did have this concern at the beginning as well, regarding how we
>>>> could determine whether a feature is categorical or Continuous. Usually
>>>> this is strictly dependent on the domain of the dataset (i.e. user have to
>>>> decide this with the knowledge about the data). That was the idea behind
>>>> letting user change the data type.. But since we needed a default option,
>>>> we had to go for the threshold thing, which was the olny option we could
>>>> come-up with. I did a bit of research on this too, but only to find no
>>>> other solution :(
>>>>
>>>> Thanks,
>>>> Supun
>>>>
>>>> On Thu, Aug 13, 2015 at 1:49 AM, Nirmal Fernando <nir...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We have a feature in ML where we suggest a given data column of a
>>>>> dataset is categorical or numerical. Currently, how we determine this is 
>>>>> by
>>>>> using a threshold value (The maximum number of categories that can
>>>>> have in a non-string categorical feature. If exceeds, the feature
>>>>> will be treated as a numerical feature.). But this is not a
>>>>> successful measurement for most of the datasets.
>>>>>
>>>>> Can we use 'skewness' of a distribution as a measurement to determine
>>>>> this? Can we say, a column is numerical, if the modulus of the skewness of
>>>>> the distribution is less than a certain threshold (say 0.01) ?
>>>>>
>>>>> *References*:
>>>>>
>>>>> http://www.itrcweb.org/gsmc-1/Content/GW%20Stats/5%20Methods%20in%20indiv%20Topics/5%206%20Distributional%20Tests.htm
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Team Lead - WSO2 Machine Learner
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>> _______________________________________________
>> Dev mailing list
>> Dev@wso2.org
>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>
>>
> _______________________________________________
> Dev mailing list
> Dev@wso2.org
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to