Hi all, Residual plot has been added for numerical prediction algorithms. Using standard chart types as much as possible is better IMO. It will reduce user confusion in understanding visualizations. I think we need to look for some standard chart types for classification algorithms (both binary and multiclass) as well [1].
[1] http://oobaloo.co.uk/visualising-classifier-results-with-ggplot2 Thanks On Wed, May 27, 2015 at 5:38 AM, Srinath Perera <srin...@wso2.com> wrote: > +1 shall we try those? > On 26 May 2015 22:52, "Upul Bandara" <u...@wso2.com> wrote: > >> +1 for residual plots. >> >> Though I haven't used it myself Residual Plot is a useful diagnostic >> tool for regression models. >> Especially, non-linearity in regression models can be easily identified >> using it. >> >> "An Introduction to Statistical Learning" book [1] ( page 92-96) contains >> some useful information about residual plots. >> >> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf >> >> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <sup...@wso2.com> wrote: >> >>> Hi CD, >>> >>> As it pops up in the offline discussion as well, IMHO, for >>> classifications, this plot may not be the best option. But for regression, >>> we can actually use this plot but with a slight modification, that is >>> taking the difference of the predicted and actual (rather than the values >>> it self), and plot that, against a predictor variable (just like its been >>> done atm). We can also add a third variable (categorical feature) to color >>> the points. This is a standard plot (AKA Residual plot) which is usually >>> use to evaluate regression models. >>> >>> One other thing we can try out is, doing the same for classification as >>> well. i.e: Taking the difference between the actual probability (o or 1) >>> and the predicted probability, and plot that, and see whether it gives a >>> better overall picture. Not sure how will it come out though :) If it comes >>> right, then any point lies above 0.5 (or the threshold we used) is wrongly >>> classified, and hence we can get a rough idea, on for which values of >>> x-axis feature, does the points get wrongly classified. I mean, we should >>> be able to see any pattern, if there exists. >>> >>> Thanks, >>> Supun >>> >>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <chathur...@wso2.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Plotting predicted and actual values against a feature doesn't look >>>> very intuitive, specially for non-probabilistic models. Please check the >>>> attachments. Any thoughts on making this visualization better? >>>> >>>> Thanks >>>> >>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <srin...@wso2.com> >>>> wrote: >>>> >>>>> yes, rerun using a random sample from test data is OK. >>>>> >>>>> --Srinath >>>>> >>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <chathur...@wso2.com> >>>>> wrote: >>>>> >>>>>> Hi Srinath, >>>>>> >>>>>> Still that random sample will not correspond to predicted vs. actual >>>>>> values in test results. Given that there is no mapping between random >>>>>> sample data points and test result points. One thing we can do is running >>>>>> test separately (using the same model) for sampled data for the sole >>>>>> purpose of visualization. Any other options? >>>>>> >>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <srin...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi CD, >>>>>>> >>>>>>> Can we take a random sample from the test data and use that for this >>>>>>> process? >>>>>>> >>>>>>> --Srianth >>>>>>> >>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <chathur...@wso2.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> To implement $subject in ML we need all feature values of the >>>>>>>> dataset against predicted and actual values for test data. But Spark >>>>>>>> only >>>>>>>> returns predicted and actual values as test results. Right now we use >>>>>>>> random 10,000 data rows for other visualizations and we cannot use same >>>>>>>> data for this visualization since that random 10,000 data does not >>>>>>>> correspond to test data (test data is a subtracted from dataset >>>>>>>> according >>>>>>>> to the train data fraction at model building stage). >>>>>>>> >>>>>>>> One option is to persist test data at testing stage, but it can be >>>>>>>> too large for some datasets according to train data fraction. >>>>>>>> Appreciate if >>>>>>>> you can give your comments on this. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> CD >>>>>>>> >>>>>>>> -- >>>>>>>> *CD Athuraliya* >>>>>>>> Software Engineer >>>>>>>> WSO2, Inc. >>>>>>>> lean . enterprise . middleware >>>>>>>> Mobile: +94 716288847 <94716288847> >>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>>>> <http://cdathuraliya.tumblr.com/> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ============================ >>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>> Phone: 0772360902 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *CD Athuraliya* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> lean . enterprise . middleware >>>>>> Mobile: +94 716288847 <94716288847> >>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>> <http://cdathuraliya.tumblr.com/> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ============================ >>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>> Site: http://people.apache.org/~hemapani/ >>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>> Phone: 0772360902 >>>>> >>>> >>>> >>>> >>>> -- >>>> *CD Athuraliya* >>>> Software Engineer >>>> WSO2, Inc. >>>> lean . enterprise . middleware >>>> Mobile: +94 716288847 <94716288847> >>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>> <https://twitter.com/cdathuraliya> | Blog >>>> <http://cdathuraliya.tumblr.com/> >>>> >>> >>> >>> >>> -- >>> *Supun Sethunga* >>> Software Engineer >>> WSO2, Inc. >>> http://wso2.com/ >>> lean | enterprise | middleware >>> Mobile : +94 716546324 >>> >> >> >> >> -- >> Upul Bandara, >> Associate Technical Lead, WSO2, Inc., >> Mob: +94 715 468 345. >> > -- *CD Athuraliya* Software Engineer WSO2, Inc. lean . enterprise . middleware Mobile: +94 716288847 <94716288847> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter <https://twitter.com/cdathuraliya> | Blog <http://cdathuraliya.tumblr.com/>
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev