Re: [R] removing outlier

2015-09-12 Thread Juli
Hi Jim, 

thank you for your help. :)

My point is, that there are outlier and I don´t really know how to deal with
that. 

I need the dataframe for a regression and read often that only a few outlier
can change your results very much. In addition, regression diacnostics
didn´t indcate me the best results.
Yes, and I know its not the core of statistics to work in a way you get
results you would like to have ;).

So what is your suggestion?

And if I remove the outliers, my problem ist, that as you said, they differ
in length. I need the data frame for a regression, so can I remove the whole
column or is there a call to exclude the data?

JULI



--
View this message in context: 
http://r.789695.n4.nabble.com/removing-outlier-tp4712137p4712170.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] removing outlier

2015-09-11 Thread Juli
Hey,

i want to remove outliers so I tried do do this: 

# 1 define mean and sd
sd.AT_ZU_SPAET <- sd(AT_ZU_SPAET)
mitt.AT_ZU_SPAET <- mean(AT_ZU_SPAET)
#
sd.Anzahl_BAF <- sd(Anzahl_BAF)
mitt.Anzahl_BAF <- mean(Anzahl_BAF)
#
sd.Änderungsintervall <- sd(Änderungsintervall)
mitt.Änderungsintervall <- mean(Änderungsintervall)
#
# 2 identify outliers 
DA[ abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) > ( 3 * sd.AT_ZU_SPAET)  , ]
DA[ abs(Anzahl_BAF - mitt.Anzahl_BAF) > ( 3 * sd.Anzahl_BAF)  , ]
DA[ abs(Änderungsintervall - mitt.Änderungsintervall) > ( 3 *
sd.Änderungsintervall)  , ]
#
# 3 remove outliers
AT_ZU_SPAET.clean <- DA[ (abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) <
(3*sd.AT_ZU_SPAET)), ]
Anzahl_BAF.clean <- DA[ (abs(Anzahl_BAF - mitt.Anzahl_BAF) <
(3*sd.Anzahl_BAF)), ]
Änderungsintervall.clean <- DA[ (abs(Änderungsintervall -
mitt.Änderungsintervall) <
(3*sd.Änderungsintervall)), ]

My problem ist, that I am only able to remove the outliers of one column of
my table, but I want to remove the outliers of every column of the table. 

Could anybody help me?




--
View this message in context: 
http://r.789695.n4.nabble.com/removing-outlier-tp4712137.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] removing blanks from a string

2008-06-27 Thread juli pausas
Hi
Is there a way to remove blank characters from the end of strings in a
vector? Something like the =TRIM functions of the OpenOffice
spreadsheet. E.g.,
a - c(hola, Yes , hello   )# I'd like to get:
c(hola, Yes, hello)

Thanks

Juli


--
http://www.ceam.es/pausas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] write.table a df with specific column order

2008-06-26 Thread juli pausas
Hi
I'd like to write.table a dataframe, but with an specific order of
columns. Is there a direct way to do it? or I have to generate a new
dataframe as follows:

t  - data.frame(c=1:10, b=11:20, a=letters[1:10])
t2 - data.frame(a=t$a, b=t$b, c=t$c)
write.table(t2, row.names=F)

Thanks for any comment

Juli

--
http://www.ceam.es/pausas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] envelope line from a cloud of points

2008-05-05 Thread juli pausas
Hi,
Is there a way in R to plot an envelope line from a cloud of points
(x, y data) ? That is, a smooth line that include all points, where
the points do not follow a strait linear pattern. Could somebody
redirect me to some package or function for this? Thanks in advance.

Juli

-- 
http://www.ceam.es/pausas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm binomial with no successes

2008-02-27 Thread juli pausas
Dear all,
I have a question on glm, family binomial. I do not see significant
differences between the levels of a factor (treatment) if all data for
a level is 0; and replacing a 0 for a 1 (in fact reducing the
difference), then I detect the significant difference that I expected.
Is there a way to overcome this problem? or this is an expected
behaviour ?  Here is an example:

s - c(2,4,4,5,0,0,0,0)
f - c(31,28,28,28,32,37,34,35)
tr - gl(2, 4)
sf - cbind(s,f)  # numbers of successes and failures
summary(glm(sf ~ tr, family=binomial))  # tr ns

sf[8,1] - 1
summary(glm(sf ~ tr, family=binomial))  # tr significative **

Thanks for any suggestion

Juli

-- 
http://www.ceam.es/pausas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm binomial with no successes

2008-02-27 Thread juli pausas
Thank you very much for your reply.
Then I understand that would not be correct to perform the test in
summary for testing the significance of the different levels of a
factor in relation to the first level, including when there are more
than 2 levels, as in my real case; at least for binomial regressions.
So here a more close-to-real example, with a 3-level factor

s - c(rpois(8, 4), rep(0, 4))
f - rpois(12, 30)
tr - gl(3, 4)
sf - cbind(s,f)
drop1(glm(sf ~ tr, family=binomial), test=Chisq) # significant
summary(glm(sf ~ tr, family=binomial)) # the 3rd level
is not significant from the 1st

So I understand that I need to explite the data and perform the two
tests separately:

drop1(glm(sf ~ tr, family=binomial, subset=(tr %in% c(1, 2))),
test=Chisq) # ns as expected

drop1(glm(sf ~ tr, family=binomial, subset=(tr %in% c(1, 3))),
test=Chisq) # significant, as expected

Is this the correct approach?
Many thanks

Juli

On Wed, Feb 27, 2008 at 12:13 PM, Prof Brian Ripley
[EMAIL PROTECTED] wrote:
 On Wed, 27 Feb 2008, juli pausas wrote:

   Dear all,
   I have a question on glm, family binomial. I do not see significant
   differences between the levels of a factor (treatment) if all data for
   a level is 0; and replacing a 0 for a 1 (in fact reducing the
   difference), then I detect the significant difference that I expected.

  This is because you are using the wrong test, one with negligible power.
  See MASS4 pp.197-8 -- you need to use the LRT, as in

   drop1(glm(sf ~ tr, family=binomial), test=Chisq)
  Single term deletions

  Model:
  sf ~ tr
 Df DevianceAICLRT   Pr(Chi)
  none   1.595 17.730
  tr  1   24.244 38.379 22.649 1.944e-06

  (and in your example you can replace 'drop1' by 'anova').


   Is there a way to overcome this problem? or this is an expected
   behaviour ?  Here is an example:
  
   s - c(2,4,4,5,0,0,0,0)
   f - c(31,28,28,28,32,37,34,35)
   tr - gl(2, 4)
   sf - cbind(s,f)  # numbers of successes and failures
   summary(glm(sf ~ tr, family=binomial))  # tr ns
  
   sf[8,1] - 1
   summary(glm(sf ~ tr, family=binomial))  # tr significative **
  
   Thanks for any suggestion
  
   Juli
  
   --
   http://www.ceam.es/pausas
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  

  --
  Brian D. Ripley,  [EMAIL PROTECTED]
  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
  University of Oxford, Tel:  +44 1865 272861 (self)
  1 South Parks Road, +44 1865 272866 (PA)
  Oxford OX1 3TG, UKFax:  +44 1865 272595




-- 
http://www.ceam.es/pausas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reshape

2008-02-10 Thread juli pausas
Dear colleagues,
I'd like to reshape a datafame in a long format to a wide format, but
I do not quite get what I want. Here is an example of the data I've
have (dat):

sp - c(a, a, a, a, b, b, b, c, d, d, d, d)
tr - c(A, B, B, C, A, B, C, A, A, B, C, C)
code - c(a1, a2, a2, a3, a3, a3, a4, a4, a4, a5,
a5, a6)
dat - data.frame(id=1:12, sp=sp, tr=tr, val=31:42, code=code)

and below is what I'd like to obtain. That is, I'd like the tr
variable in different columns (as a timevar) with their value (val).

sp  code  tr.A  tr.B  tr.C
aa1   31NANA
aa2   NA32NA
aa2   NA33NA**
aa3   NANA34
ba3   3536NA
ba4   NANA37
ca4   38NANA
da4   39NANA
da5   NA4041
da6   NANA42

Using reshape:

reshape(dat[,2:5], direction=wide, timevar=tr, idvar=c(code,sp ))

I'm getting very close. The only difference is in the 3rd row (**),
that is when sp and code are the same I only get one record. Is there
a way to get all records? Any idea?

Thank you very much for any help

Juli Pausas

-- 
http://www.ceam.es/pausas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] repeated measures - aov, lme, lmer - help

2007-10-14 Thread juli pausas
Dear all,
I'm not very sure on the use of repeated measures in R, so some advice
would be very appreciate.
Here is a simple example similar to my real problem (R 2.6.0 for
windows): Lets supose I have annual tree production measured in 9
trees during 3 years; the 9 trees are located in 3 different mountains
(sites), and each tree receive different annual rainfall (different
locations). I would like to known the parameters that explain the
variability in production. The data would be something like:

set.seed(111)
mydat - data.frame(tree= factor(rep(1:9,3)), year=
gl(3,9,lab=2001:2003, orde=T), site= gl(3,3,27,lab=c(A,B,C)),
rain= c(rnorm(9, 100), rnorm(9, 200),rnorm(9, 300)), prod=
51:77+rnorm(27, 1), pr01= rbinom(27, 1, 0.5))
mydat
# see for instance
interaction.plot(mydat$year, mydat$site, mydat$prod)

#My first attempt was to use aov:

summary(aov(prod ~ rain + year + site + Error(tree), data=mydat))
#
# Error: tree
#   Df  Sum Sq Mean Sq F value  Pr(F)
# rain   1  36.814  36.814  6.4423 0.05201 .
# site   2 112.588  56.294  9.8513 0.01843 *
# Residuals  5  28.572   5.714
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Error: Within
#   Df  Sum Sq Mean Sq   F valuePr(F)
# rain   1 1688.50 1688.50 1142.6437 1.422e-15 ***
# year   2   12.746.374.3103   0.03319 *
# Residuals 15   22.171.48
#
# The results seems OK to me,
# Is there a way to get predictions from the model? (as in predict.lm)
# And to get the explained variance?

# Using the lme (which allows to predict), I suppose the same model
would be (is it right?):
library(nlme)
res - lme(prod ~ rain + year + site, data=mydat, random=  ~ 1 |
tree);  anova(res)
# numDF denDF   F-value p-value
# (Intercept) 115 23027.700  .0001
# rain115  1144.461  .0001
# year215 2.288  0.1358
# site2 617.267  0.0032

# The rain, varies within tree and between trees (in time), thus the
aov give me the significance of each part (less significant Between
than Within)
# Does lme give me Within only? or it include both Between and Within?
Year was significant in the aov and not in the lme

# I also want to test a binary (binomial) variable (no production vs
production), so I guess I should use lmer
# First the same model as above but with lmer:
library(lme4)
res2 - lmer(prod ~ rain + year + site + (1|tree), data=mydat); anova(res2)
res2

# Is this the correct way to fit the model above? I'm unsure; results
are slightly different.
# I understand that there is no predict for lmer models. anova does
not give me the significance for lmer modes, so I guess I should enter
the variables by steps and compare models with anova(m1, m2, ...)
# and now the binary data. Is this correct?
res3 - lmer(pr01 ~ rain + year + site + (1|tree), data=mydat,
family=binomial); anova(res3)
res3

Many thanks for any comments

Juli

-- 
http://www.ceam.es/pausas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.