Re: [R] question about linear regression and leverage

2011-06-21 Thread George Markomanolis
Dear David,

Thanks for your answer. Yes now that you mentioned these points are in
the beginning of a variable range. From the plot of the residuals seems
to have non constant variance which is solved by a transformation. I
checked also for interactions by using the symbol : between two
variables and the change on the result was not so important. I am
working on computer science field but I wanted to do an analysis from
scratch because some previous results that I have seen are not good for
such cases. Moreover the data are not the same of course.

Thanks,
George

On 06/21/2011 01:08 PM, David Winsemius wrote:
>
> On Jun 21, 2011, at 3:49 AM, George Markomanolis wrote:
>
>> Dear all,
>>
>> I am new to this field and I have a question about a linear regression.
>> I have a dataset of around to 31000 points and I want to apply a linear
>> regression. The R-squared is 0.9 however when I check the diagnostic
>> plots I can see that there are around to 250 points with big leverage
>> value. As I know the points with big leverage influence a lot the fit.
>> If I remove these points in order to check their influence, the
>> R-squared of the rest points is 0.71. So I removed less than 1% of my
>> data and the fit is not so good. Could you please give me any advice
>> about this? Is it right to let these 250 points in my dataset or not?
>> Could I do something else? The data are measured through an experiment
>> so even these 250 points are real values.
>
> You could be looking at the descriptive statistics on the points.
> Perhaps they are at one end of a variable range, or you perhaps have
> some other feature that is scientifically interesting. So far you have
> only been examining one set of simple linear hypotheses and have not
> (presumably) been looking at any non-linear possibilities or the
> potential that interactions are affecting the outcome. The prior 
> science of your (so far undescribed) domain should be carefully
> considered, but in your message we see no evidence of such.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about linear regression and leverage

2011-06-21 Thread George Markomanolis
Dear all,

I am new to this field and I have a question about a linear regression.
I have a dataset of around to 31000 points and I want to apply a linear
regression. The R-squared is 0.9 however when I check the diagnostic
plots I can see that there are around to 250 points with big leverage
value. As I know the points with big leverage influence a lot the fit.
If I remove these points in order to check their influence, the
R-squared of the rest points is 0.71. So I removed less than 1% of my
data and the fit is not so good. Could you please give me any advice
about this? Is it right to let these 250 points in my dataset or not?
Could I do something else? The data are measured through an experiment
so even these 250 points are real values.

Thanks a lot,
George

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.