Hi CD, As it pops up in the offline discussion as well, IMHO, for classifications, this plot may not be the best option. But for regression, we can actually use this plot but with a slight modification, that is taking the difference of the predicted and actual (rather than the values it self), and plot that, against a predictor variable (just like its been done atm). We can also add a third variable (categorical feature) to color the points. This is a standard plot (AKA Residual plot) which is usually use to evaluate regression models.
One other thing we can try out is, doing the same for classification as well. i.e: Taking the difference between the actual probability (o or 1) and the predicted probability, and plot that, and see whether it gives a better overall picture. Not sure how will it come out though :) If it comes right, then any point lies above 0.5 (or the threshold we used) is wrongly classified, and hence we can get a rough idea, on for which values of x-axis feature, does the points get wrongly classified. I mean, we should be able to see any pattern, if there exists. Thanks, Supun On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <chathur...@wso2.com> wrote: > Hi, > > Plotting predicted and actual values against a feature doesn't look very > intuitive, specially for non-probabilistic models. Please check the > attachments. Any thoughts on making this visualization better? > > Thanks > > On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <srin...@wso2.com> wrote: > >> yes, rerun using a random sample from test data is OK. >> >> --Srinath >> >> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <chathur...@wso2.com> >> wrote: >> >>> Hi Srinath, >>> >>> Still that random sample will not correspond to predicted vs. actual >>> values in test results. Given that there is no mapping between random >>> sample data points and test result points. One thing we can do is running >>> test separately (using the same model) for sampled data for the sole >>> purpose of visualization. Any other options? >>> >>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <srin...@wso2.com> >>> wrote: >>> >>>> Hi CD, >>>> >>>> Can we take a random sample from the test data and use that for this >>>> process? >>>> >>>> --Srianth >>>> >>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <chathur...@wso2.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> To implement $subject in ML we need all feature values of the dataset >>>>> against predicted and actual values for test data. But Spark only returns >>>>> predicted and actual values as test results. Right now we use random >>>>> 10,000 >>>>> data rows for other visualizations and we cannot use same data for this >>>>> visualization since that random 10,000 data does not correspond to test >>>>> data (test data is a subtracted from dataset according to the train data >>>>> fraction at model building stage). >>>>> >>>>> One option is to persist test data at testing stage, but it can be too >>>>> large for some datasets according to train data fraction. Appreciate if >>>>> you >>>>> can give your comments on this. >>>>> >>>>> Thanks, >>>>> CD >>>>> >>>>> -- >>>>> *CD Athuraliya* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> lean . enterprise . middleware >>>>> Mobile: +94 716288847 <94716288847> >>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>> <https://twitter.com/cdathuraliya> | Blog >>>>> <http://cdathuraliya.tumblr.com/> >>>>> >>>> >>>> >>>> >>>> -- >>>> ============================ >>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>> Site: http://people.apache.org/~hemapani/ >>>> Photos: http://www.flickr.com/photos/hemapani/ >>>> Phone: 0772360902 >>>> >>> >>> >>> >>> -- >>> *CD Athuraliya* >>> Software Engineer >>> WSO2, Inc. >>> lean . enterprise . middleware >>> Mobile: +94 716288847 <94716288847> >>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>> <https://twitter.com/cdathuraliya> | Blog >>> <http://cdathuraliya.tumblr.com/> >>> >> >> >> >> -- >> ============================ >> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >> Site: http://people.apache.org/~hemapani/ >> Photos: http://www.flickr.com/photos/hemapani/ >> Phone: 0772360902 >> > > > > -- > *CD Athuraliya* > Software Engineer > WSO2, Inc. > lean . enterprise . middleware > Mobile: +94 716288847 <94716288847> > LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter > <https://twitter.com/cdathuraliya> | Blog > <http://cdathuraliya.tumblr.com/> > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev