Hi all,

Residual plot has been added for numerical prediction algorithms. Using
standard chart types as much as possible is better IMO. It will reduce user
confusion in understanding visualizations. I think we need to look for some
standard chart types for classification algorithms (both binary and
multiclass) as well [1].

[1] http://oobaloo.co.uk/visualising-classifier-results-with-ggplot2

Thanks

On Wed, May 27, 2015 at 5:38 AM, Srinath Perera <srin...@wso2.com> wrote:

> +1 shall we try those?
> On 26 May 2015 22:52, "Upul Bandara" <u...@wso2.com> wrote:
>
>> +1 for residual plots.
>>
>> Though I haven't used it myself Residual Plot  is a useful diagnostic
>> tool for regression models.
>> Especially, non-linearity in regression models can be easily identified
>> using it.
>>
>> "An Introduction to Statistical Learning" book [1] ( page 92-96) contains
>> some useful information about residual plots.
>>
>> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf
>>
>> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <sup...@wso2.com> wrote:
>>
>>> Hi CD,
>>>
>>> As it pops up in the offline discussion as well, IMHO, for
>>> classifications, this plot may not be the best option. But for regression,
>>> we can actually use this plot but with a slight modification, that is
>>> taking the difference of the predicted and actual (rather than the values
>>> it self), and plot that, against a predictor variable (just like its been
>>> done atm). We can also add a third variable (categorical feature) to color
>>> the points. This is a standard plot (AKA Residual plot) which is usually
>>> use to evaluate regression models.
>>>
>>> One other thing we can try out is, doing the same for classification as
>>> well. i.e: Taking the difference between the actual probability (o or 1)
>>> and the predicted probability, and plot that, and see whether it gives a
>>> better overall picture. Not sure how will it come out though :) If it comes
>>> right, then any point lies above 0.5 (or the threshold we used) is wrongly
>>> classified, and hence we can get a rough idea, on for which values of
>>> x-axis feature, does the points get wrongly classified. I mean, we should
>>> be able to see any pattern, if there exists.
>>>
>>> Thanks,
>>> Supun
>>>
>>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <chathur...@wso2.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Plotting predicted and actual values against a feature doesn't look
>>>> very intuitive, specially for non-probabilistic models. Please check the
>>>> attachments. Any thoughts on making this visualization better?
>>>>
>>>> Thanks
>>>>
>>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <srin...@wso2.com>
>>>> wrote:
>>>>
>>>>> yes, rerun using a random sample from test data is OK.
>>>>>
>>>>> --Srinath
>>>>>
>>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <chathur...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Srinath,
>>>>>>
>>>>>> Still that random sample will not correspond to predicted vs. actual
>>>>>> values in test results. Given that there is no mapping between random
>>>>>> sample data points and test result points. One thing we can do is running
>>>>>> test separately (using the same model) for sampled data for the sole
>>>>>> purpose of visualization. Any other options?
>>>>>>
>>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <srin...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi CD,
>>>>>>>
>>>>>>> Can we take a random sample from the test data and use that for this
>>>>>>> process?
>>>>>>>
>>>>>>> --Srianth
>>>>>>>
>>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <chathur...@wso2.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> To implement $subject in ML we need all feature values of the
>>>>>>>> dataset against predicted and actual values for test data. But Spark 
>>>>>>>> only
>>>>>>>> returns predicted and actual values as test results. Right now we use
>>>>>>>> random 10,000 data rows for other visualizations and we cannot use same
>>>>>>>> data for this visualization since that random 10,000 data does not
>>>>>>>> correspond to test data (test data is a subtracted from dataset 
>>>>>>>> according
>>>>>>>> to the train data fraction at model building stage).
>>>>>>>>
>>>>>>>> One option is to persist test data at testing stage, but it can be
>>>>>>>> too large for some datasets according to train data fraction. 
>>>>>>>> Appreciate if
>>>>>>>> you can give your comments on this.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> CD
>>>>>>>>
>>>>>>>> --
>>>>>>>> *CD Athuraliya*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> lean . enterprise . middleware
>>>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ============================
>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>> Phone: 0772360902
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *CD Athuraliya*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> lean . enterprise . middleware
>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ============================
>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>> Site: http://people.apache.org/~hemapani/
>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>> Phone: 0772360902
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *CD Athuraliya*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> lean . enterprise . middleware
>>>> Mobile: +94 716288847 <94716288847>
>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>> <https://twitter.com/cdathuraliya> | Blog
>>>> <http://cdathuraliya.tumblr.com/>
>>>>
>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>
>>
>>
>> --
>> Upul Bandara,
>> Associate Technical Lead, WSO2, Inc.,
>> Mob: +94 715 468 345.
>>
>


-- 
*CD Athuraliya*
Software Engineer
WSO2, Inc.
lean . enterprise . middleware
Mobile: +94 716288847 <94716288847>
LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
<https://twitter.com/cdathuraliya> | Blog <http://cdathuraliya.tumblr.com/>
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to