Great work CD! On Thu, May 28, 2015 at 11:46 AM, CD Athuraliya <chathur...@wso2.com> wrote:
> Hi all, > > Residual plot has been added for numerical prediction algorithms. Using > standard chart types as much as possible is better IMO. It will reduce user > confusion in understanding visualizations. I think we need to look for some > standard chart types for classification algorithms (both binary and > multiclass) as well [1]. > > [1] http://oobaloo.co.uk/visualising-classifier-results-with-ggplot2 > > Thanks > > On Wed, May 27, 2015 at 5:38 AM, Srinath Perera <srin...@wso2.com> wrote: > >> +1 shall we try those? >> On 26 May 2015 22:52, "Upul Bandara" <u...@wso2.com> wrote: >> >>> +1 for residual plots. >>> >>> Though I haven't used it myself Residual Plot is a useful diagnostic >>> tool for regression models. >>> Especially, non-linearity in regression models can be easily identified >>> using it. >>> >>> "An Introduction to Statistical Learning" book [1] ( page 92-96) >>> contains some useful information about residual plots. >>> >>> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf >>> >>> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <sup...@wso2.com> wrote: >>> >>>> Hi CD, >>>> >>>> As it pops up in the offline discussion as well, IMHO, for >>>> classifications, this plot may not be the best option. But for regression, >>>> we can actually use this plot but with a slight modification, that is >>>> taking the difference of the predicted and actual (rather than the values >>>> it self), and plot that, against a predictor variable (just like its been >>>> done atm). We can also add a third variable (categorical feature) to color >>>> the points. This is a standard plot (AKA Residual plot) which is usually >>>> use to evaluate regression models. >>>> >>>> One other thing we can try out is, doing the same for classification as >>>> well. i.e: Taking the difference between the actual probability (o or 1) >>>> and the predicted probability, and plot that, and see whether it gives a >>>> better overall picture. Not sure how will it come out though :) If it comes >>>> right, then any point lies above 0.5 (or the threshold we used) is wrongly >>>> classified, and hence we can get a rough idea, on for which values of >>>> x-axis feature, does the points get wrongly classified. I mean, we should >>>> be able to see any pattern, if there exists. >>>> >>>> Thanks, >>>> Supun >>>> >>>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <chathur...@wso2.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Plotting predicted and actual values against a feature doesn't look >>>>> very intuitive, specially for non-probabilistic models. Please check the >>>>> attachments. Any thoughts on making this visualization better? >>>>> >>>>> Thanks >>>>> >>>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <srin...@wso2.com> >>>>> wrote: >>>>> >>>>>> yes, rerun using a random sample from test data is OK. >>>>>> >>>>>> --Srinath >>>>>> >>>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <chathur...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Srinath, >>>>>>> >>>>>>> Still that random sample will not correspond to predicted vs. actual >>>>>>> values in test results. Given that there is no mapping between random >>>>>>> sample data points and test result points. One thing we can do is >>>>>>> running >>>>>>> test separately (using the same model) for sampled data for the sole >>>>>>> purpose of visualization. Any other options? >>>>>>> >>>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <srin...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi CD, >>>>>>>> >>>>>>>> Can we take a random sample from the test data and use that for >>>>>>>> this process? >>>>>>>> >>>>>>>> --Srianth >>>>>>>> >>>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya < >>>>>>>> chathur...@wso2.com> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> To implement $subject in ML we need all feature values of the >>>>>>>>> dataset against predicted and actual values for test data. But Spark >>>>>>>>> only >>>>>>>>> returns predicted and actual values as test results. Right now we use >>>>>>>>> random 10,000 data rows for other visualizations and we cannot use >>>>>>>>> same >>>>>>>>> data for this visualization since that random 10,000 data does not >>>>>>>>> correspond to test data (test data is a subtracted from dataset >>>>>>>>> according >>>>>>>>> to the train data fraction at model building stage). >>>>>>>>> >>>>>>>>> One option is to persist test data at testing stage, but it can be >>>>>>>>> too large for some datasets according to train data fraction. >>>>>>>>> Appreciate if >>>>>>>>> you can give your comments on this. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> CD >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *CD Athuraliya* >>>>>>>>> Software Engineer >>>>>>>>> WSO2, Inc. >>>>>>>>> lean . enterprise . middleware >>>>>>>>> Mobile: +94 716288847 <94716288847> >>>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>>>>> <http://cdathuraliya.tumblr.com/> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>> Phone: 0772360902 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *CD Athuraliya* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> lean . enterprise . middleware >>>>>>> Mobile: +94 716288847 <94716288847> >>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>>> <http://cdathuraliya.tumblr.com/> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ============================ >>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>> Site: http://people.apache.org/~hemapani/ >>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>> Phone: 0772360902 >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *CD Athuraliya* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> lean . enterprise . middleware >>>>> Mobile: +94 716288847 <94716288847> >>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>> <https://twitter.com/cdathuraliya> | Blog >>>>> <http://cdathuraliya.tumblr.com/> >>>>> >>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> >>> >>> >>> >>> -- >>> Upul Bandara, >>> Associate Technical Lead, WSO2, Inc., >>> Mob: +94 715 468 345. >>> >> > > > -- > *CD Athuraliya* > Software Engineer > WSO2, Inc. > lean . enterprise . middleware > Mobile: +94 716288847 <94716288847> > LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter > <https://twitter.com/cdathuraliya> | Blog > <http://cdathuraliya.tumblr.com/> > -- Thanks & regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev