Re: [Dev] [ML] Predicted vs. actuals chart in model summary

CD Athuraliya Wed, 27 May 2015 23:17:25 -0700

Hi all,

Residual plot has been added for numerical prediction algorithms. Using
standard chart types as much as possible is better IMO. It will reduce user
confusion in understanding visualizations. I think we need to look for some
standard chart types for classification algorithms (both binary and
multiclass) as well [1].


[1] http://oobaloo.co.uk/visualising-classifier-results-with-ggplot2

Thanks

On Wed, May 27, 2015 at 5:38 AM, Srinath Perera <[email protected]> wrote:

> +1 shall we try those?
> On 26 May 2015 22:52, "Upul Bandara" <[email protected]> wrote:
>
>> +1 for residual plots.
>>
>> Though I haven't used it myself Residual Plot  is a useful diagnostic
>> tool for regression models.
>> Especially, non-linearity in regression models can be easily identified
>> using it.
>>
>> "An Introduction to Statistical Learning" book [1] ( page 92-96) contains
>> some useful information about residual plots.
>>
>> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf
>>
>> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <[email protected]> wrote:
>>
>>> Hi CD,
>>>
>>> As it pops up in the offline discussion as well, IMHO, for
>>> classifications, this plot may not be the best option. But for regression,
>>> we can actually use this plot but with a slight modification, that is
>>> taking the difference of the predicted and actual (rather than the values
>>> it self), and plot that, against a predictor variable (just like its been
>>> done atm). We can also add a third variable (categorical feature) to color
>>> the points. This is a standard plot (AKA Residual plot) which is usually
>>> use to evaluate regression models.
>>>
>>> One other thing we can try out is, doing the same for classification as
>>> well. i.e: Taking the difference between the actual probability (o or 1)
>>> and the predicted probability, and plot that, and see whether it gives a
>>> better overall picture. Not sure how will it come out though :) If it comes
>>> right, then any point lies above 0.5 (or the threshold we used) is wrongly
>>> classified, and hence we can get a rough idea, on for which values of
>>> x-axis feature, does the points get wrongly classified. I mean, we should
>>> be able to see any pattern, if there exists.
>>>
>>> Thanks,
>>> Supun
>>>
>>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Plotting predicted and actual values against a feature doesn't look
>>>> very intuitive, specially for non-probabilistic models. Please check the
>>>> attachments. Any thoughts on making this visualization better?
>>>>
>>>> Thanks
>>>>
>>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <[email protected]>
>>>> wrote:
>>>>
>>>>> yes, rerun using a random sample from test data is OK.
>>>>>
>>>>> --Srinath
>>>>>
>>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Srinath,
>>>>>>
>>>>>> Still that random sample will not correspond to predicted vs. actual
>>>>>> values in test results. Given that there is no mapping between random
>>>>>> sample data points and test result points. One thing we can do is running
>>>>>> test separately (using the same model) for sampled data for the sole
>>>>>> purpose of visualization. Any other options?
>>>>>>
>>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi CD,
>>>>>>>
>>>>>>> Can we take a random sample from the test data and use that for this
>>>>>>> process?
>>>>>>>
>>>>>>> --Srianth
>>>>>>>
>>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <[email protected]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> To implement $subject in ML we need all feature values of the
>>>>>>>> dataset against predicted and actual values for test data. But Spark 
>>>>>>>> only
>>>>>>>> returns predicted and actual values as test results. Right now we use
>>>>>>>> random 10,000 data rows for other visualizations and we cannot use same
>>>>>>>> data for this visualization since that random 10,000 data does not
>>>>>>>> correspond to test data (test data is a subtracted from dataset 
>>>>>>>> according
>>>>>>>> to the train data fraction at model building stage).
>>>>>>>>
>>>>>>>> One option is to persist test data at testing stage, but it can be
>>>>>>>> too large for some datasets according to train data fraction. 
>>>>>>>> Appreciate if
>>>>>>>> you can give your comments on this.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> CD
>>>>>>>>
>>>>>>>> --
>>>>>>>> *CD Athuraliya*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> lean . enterprise . middleware
>>>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ============================
>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>> Phone: 0772360902
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *CD Athuraliya*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> lean . enterprise . middleware
>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ============================
>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>> Site: http://people.apache.org/~hemapani/
>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>> Phone: 0772360902
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *CD Athuraliya*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> lean . enterprise . middleware
>>>> Mobile: +94 716288847 <94716288847>
>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>> <https://twitter.com/cdathuraliya> | Blog
>>>> <http://cdathuraliya.tumblr.com/>
>>>>
>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>
>>
>>
>> --
>> Upul Bandara,
>> Associate Technical Lead, WSO2, Inc.,
>> Mob: +94 715 468 345.
>>
>


-- 
*CD Athuraliya*
Software Engineer
WSO2, Inc.
lean . enterprise . middleware
Mobile: +94 716288847 <94716288847>
LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
<https://twitter.com/cdathuraliya> | Blog <http://cdathuraliya.tumblr.com/>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Predicted vs. actuals chart in model summary

Reply via email to