Re: [Dev] [ML] Predicted vs. actuals chart in model summary

Nirmal Fernando Wed, 27 May 2015 23:23:35 -0700

Great work CD!

On Thu, May 28, 2015 at 11:46 AM, CD Athuraliya <chathur...@wso2.com> wrote:


> Hi all,
>
> Residual plot has been added for numerical prediction algorithms. Using
> standard chart types as much as possible is better IMO. It will reduce user
> confusion in understanding visualizations. I think we need to look for some
> standard chart types for classification algorithms (both binary and
> multiclass) as well [1].
>
> [1] http://oobaloo.co.uk/visualising-classifier-results-with-ggplot2
>
> Thanks
>
> On Wed, May 27, 2015 at 5:38 AM, Srinath Perera <srin...@wso2.com> wrote:
>
>> +1 shall we try those?
>> On 26 May 2015 22:52, "Upul Bandara" <u...@wso2.com> wrote:
>>
>>> +1 for residual plots.
>>>
>>> Though I haven't used it myself Residual Plot  is a useful diagnostic
>>> tool for regression models.
>>> Especially, non-linearity in regression models can be easily identified
>>> using it.
>>>
>>> "An Introduction to Statistical Learning" book [1] ( page 92-96)
>>> contains some useful information about residual plots.
>>>
>>> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf
>>>
>>> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <sup...@wso2.com> wrote:
>>>
>>>> Hi CD,
>>>>
>>>> As it pops up in the offline discussion as well, IMHO, for
>>>> classifications, this plot may not be the best option. But for regression,
>>>> we can actually use this plot but with a slight modification, that is
>>>> taking the difference of the predicted and actual (rather than the values
>>>> it self), and plot that, against a predictor variable (just like its been
>>>> done atm). We can also add a third variable (categorical feature) to color
>>>> the points. This is a standard plot (AKA Residual plot) which is usually
>>>> use to evaluate regression models.
>>>>
>>>> One other thing we can try out is, doing the same for classification as
>>>> well. i.e: Taking the difference between the actual probability (o or 1)
>>>> and the predicted probability, and plot that, and see whether it gives a
>>>> better overall picture. Not sure how will it come out though :) If it comes
>>>> right, then any point lies above 0.5 (or the threshold we used) is wrongly
>>>> classified, and hence we can get a rough idea, on for which values of
>>>> x-axis feature, does the points get wrongly classified. I mean, we should
>>>> be able to see any pattern, if there exists.
>>>>
>>>> Thanks,
>>>> Supun
>>>>
>>>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <chathur...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Plotting predicted and actual values against a feature doesn't look
>>>>> very intuitive, specially for non-probabilistic models. Please check the
>>>>> attachments. Any thoughts on making this visualization better?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <srin...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> yes, rerun using a random sample from test data is OK.
>>>>>>
>>>>>> --Srinath
>>>>>>
>>>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <chathur...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Srinath,
>>>>>>>
>>>>>>> Still that random sample will not correspond to predicted vs. actual
>>>>>>> values in test results. Given that there is no mapping between random
>>>>>>> sample data points and test result points. One thing we can do is 
>>>>>>> running
>>>>>>> test separately (using the same model) for sampled data for the sole
>>>>>>> purpose of visualization. Any other options?
>>>>>>>
>>>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <srin...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi CD,
>>>>>>>>
>>>>>>>> Can we take a random sample from the test data and use that for
>>>>>>>> this process?
>>>>>>>>
>>>>>>>> --Srianth
>>>>>>>>
>>>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <
>>>>>>>> chathur...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> To implement $subject in ML we need all feature values of the
>>>>>>>>> dataset against predicted and actual values for test data. But Spark 
>>>>>>>>> only
>>>>>>>>> returns predicted and actual values as test results. Right now we use
>>>>>>>>> random 10,000 data rows for other visualizations and we cannot use 
>>>>>>>>> same
>>>>>>>>> data for this visualization since that random 10,000 data does not
>>>>>>>>> correspond to test data (test data is a subtracted from dataset 
>>>>>>>>> according
>>>>>>>>> to the train data fraction at model building stage).
>>>>>>>>>
>>>>>>>>> One option is to persist test data at testing stage, but it can be
>>>>>>>>> too large for some datasets according to train data fraction. 
>>>>>>>>> Appreciate if
>>>>>>>>> you can give your comments on this.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> CD
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *CD Athuraliya*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.
>>>>>>>>> lean . enterprise . middleware
>>>>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ============================
>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>> Phone: 0772360902
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *CD Athuraliya*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> lean . enterprise . middleware
>>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ============================
>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>> Phone: 0772360902
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *CD Athuraliya*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> lean . enterprise . middleware
>>>>> Mobile: +94 716288847 <94716288847>
>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>
>>>
>>>
>>> --
>>> Upul Bandara,
>>> Associate Technical Lead, WSO2, Inc.,
>>> Mob: +94 715 468 345.
>>>
>>
>
>
> --
> *CD Athuraliya*
> Software Engineer
> WSO2, Inc.
> lean . enterprise . middleware
> Mobile: +94 716288847 <94716288847>
> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
> <https://twitter.com/cdathuraliya> | Blog
> <http://cdathuraliya.tumblr.com/>
>



-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Predicted vs. actuals chart in model summary

Reply via email to