Hi CD,

As it pops up in the offline discussion as well, IMHO, for classifications,
this plot may not be the best option. But for regression, we can actually
use this plot but with a slight modification, that is taking the difference
of the predicted and actual (rather than the values it self), and plot
that, against a predictor variable (just like its been done atm). We can
also add a third variable (categorical feature) to color the points. This
is a standard plot (AKA Residual plot) which is usually use to evaluate
regression models.

One other thing we can try out is, doing the same for classification as
well. i.e: Taking the difference between the actual probability (o or 1)
and the predicted probability, and plot that, and see whether it gives a
better overall picture. Not sure how will it come out though :) If it comes
right, then any point lies above 0.5 (or the threshold we used) is wrongly
classified, and hence we can get a rough idea, on for which values of
x-axis feature, does the points get wrongly classified. I mean, we should
be able to see any pattern, if there exists.

Thanks,
Supun

On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <chathur...@wso2.com> wrote:

> Hi,
>
> Plotting predicted and actual values against a feature doesn't look very
> intuitive, specially for non-probabilistic models. Please check the
> attachments. Any thoughts on making this visualization better?
>
> Thanks
>
> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <srin...@wso2.com> wrote:
>
>> yes, rerun using a random sample from test data is OK.
>>
>> --Srinath
>>
>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <chathur...@wso2.com>
>> wrote:
>>
>>> Hi Srinath,
>>>
>>> Still that random sample will not correspond to predicted vs. actual
>>> values in test results. Given that there is no mapping between random
>>> sample data points and test result points. One thing we can do is running
>>> test separately (using the same model) for sampled data for the sole
>>> purpose of visualization. Any other options?
>>>
>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <srin...@wso2.com>
>>> wrote:
>>>
>>>> Hi CD,
>>>>
>>>> Can we take a random sample from the test data and use that for this
>>>> process?
>>>>
>>>> --Srianth
>>>>
>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <chathur...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> To implement $subject in ML we need all feature values of the dataset
>>>>> against predicted and actual values for test data. But Spark only returns
>>>>> predicted and actual values as test results. Right now we use random 
>>>>> 10,000
>>>>> data rows for other visualizations and we cannot use same data for this
>>>>> visualization since that random 10,000 data does not correspond to test
>>>>> data (test data is a subtracted from dataset according to the train data
>>>>> fraction at model building stage).
>>>>>
>>>>> One option is to persist test data at testing stage, but it can be too
>>>>> large for some datasets according to train data fraction. Appreciate if 
>>>>> you
>>>>> can give your comments on this.
>>>>>
>>>>> Thanks,
>>>>> CD
>>>>>
>>>>> --
>>>>> *CD Athuraliya*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> lean . enterprise . middleware
>>>>> Mobile: +94 716288847 <94716288847>
>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ============================
>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>> Site: http://people.apache.org/~hemapani/
>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>> Phone: 0772360902
>>>>
>>>
>>>
>>>
>>> --
>>> *CD Athuraliya*
>>> Software Engineer
>>> WSO2, Inc.
>>> lean . enterprise . middleware
>>> Mobile: +94 716288847 <94716288847>
>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>> <https://twitter.com/cdathuraliya> | Blog
>>> <http://cdathuraliya.tumblr.com/>
>>>
>>
>>
>>
>> --
>> ============================
>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>> Site: http://people.apache.org/~hemapani/
>> Photos: http://www.flickr.com/photos/hemapani/
>> Phone: 0772360902
>>
>
>
>
> --
> *CD Athuraliya*
> Software Engineer
> WSO2, Inc.
> lean . enterprise . middleware
> Mobile: +94 716288847 <94716288847>
> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
> <https://twitter.com/cdathuraliya> | Blog
> <http://cdathuraliya.tumblr.com/>
>



-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to