date:20121005

Hi

?merge or ?rbind

Regards
Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of killerkarthick
 Sent: Friday, October 05, 2012 6:39 AM
 To: r-help@r-project.org
 Subject: [R] How to add a new row in to an existing data set in R
 Language?
 
 Hi.
 I have one dataset with 3 variables with 4 observations. I want to add
 the some more observation into the existing dataset.
 Data;Mydata
 centerdrugchange
 101   P   18
 102   P   14
 103   D   12
 104   D   18
 105   P   10
  This is the data set. So I want to add some more rows to the existing
 dataset that rows are
 106   P   17
 107   D   18
 108   D   NA
 109   P   13
 110   P   12
 Without using any combine functions. This is possible?
 Please reply me...
 
 
 
 --
 View this message in context: http://r.789695.n4.nabble.com/How-to-add-
 a-new-row-in-to-an-existing-data-set-in-R-Language-tp4645114.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot with some 0 frequencies

2012-10-05 Thread David Winsemius


On Oct 4, 2012, at 9:05 PM, David Winsemius wrote:

 
 On Oct 4, 2012, at 4:49 PM, Guillaume2883 wrote:
 
 Hi all,
 
 I am back with a new question !
 I recorded the occurence of 4 differents event on 20 places for a given time
 period.
 Now, I want to do some barplot of the frequency of theses events for each
 place, so it should be easy. My problem is that I want to see the
 frequencies of the 4 events on my barplots even if the frequency of some of
 them is 0. 
 How could I do that ?
 
 
 Put a big zero on the axis?

barchart(abs(-3:3) ~ letters[1:7], origin=0)

require(grid)
trellis.focus(panel, 1,1)
 grid.text(label=paste(0),
 x = convertX(unit(4, native), npc),
 y = convertY(unit(0, native), npc)
 )
 trellis.unfocus()

-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Format of numbers in plotmath expressions.

2012-10-05 Thread Rolf Turner



I want to do something like:

TH - sprintf(%1.1f,c(0.3,0.5,0.7,0.9,1))
plot(1:10)
legend(bottomright,pch=1:5,legend=parse(text=paste(theta ==,TH)))

Notice that the final 1 comes out in the legend as just plain 1 and NOT
as 1.0 although TH is

[1] 0.3 0.5 0.7 0.9 1.0

I can get plotmath to preserve 1.0 as 1.0 and NOT convert it to 1
if I use substitute, as in

text(2,5,labels=substitute(list(theta == a),list(a=TH[5])))

but substitute doesn't work appropriately with vectors.

Can anyone tell me how to get a 1.0 rather than 1 in the legend?

Ta.

cheers,

Rolf Turner

P.S. Just figured out a way using sapply():

leg - sapply(TH,function(x){substitute(list(theta == a),list(a=x))})
plot(1:10)
legend(bottomright,pch=1:5,legend=parse(text=leg))

Note that the use of parse (pace Thomas Lumley! :-) ) is required ---
legend=leg does NOT work.

Getting here required an enormous amount of trial and error.  And it seems
pretty kludgy.

Is there a sexier way?

R. T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] many-facet rasch analysis in R

2012-10-05 Thread D. Alain

Dear R-list, 

is there any R-package that I can use to carry out a many-facet rasch analysis 
in R? 

Thank you.

Best wishes 
Alain
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with colors in contour plot

2012-10-05 Thread Jim Lemon


On 10/04/2012 10:25 PM, Loukia Spineli wrote:

Dear R users,

I have a 51 by 51 matrix of p-values (named as pvalue_MA). I want to
present graphically this matrix in a plot (filled contour plot) where both
axes represent probabilities. I have also added a grid in this plot. I want
to highlight in white the cells of the grid that represent p-values smaller
than the (common) significance threshold, 0.05. The code from this plot is
colored in blue (I had to copy-pasted all this code in order to reach to
the plot). I suspect that the problem might be in the col parameter of
the filled.contour function. Honestly, I cannot understand why the plot
appears to be completely white!! I checked the values of my variable and it
has a great range from 0 to 1.
Any suggestion/ comment would really help me to move on with my project.


Hi Loukia,
Do you really need a contour plot for this?

pvals-matrix(runif(2601),nrow=51)
library(plotrix)
pcols-ifelse(pvals=0.05,white,gray)
color2D.matplot(pvals,cellcolors=pcols)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating vegetation distance groups from one column

2012-10-05 Thread Jhope

Thank you! That has worked for me when creating graphs. In plyr I used the
script:

# Veg Index 
data.to.analyze$VegIndex - cut(data.to.analyze$Veg,
  breaks=seq(0, 35, 5), include.lowest=TRUE)
VegIndex - data.to.analyze$VegIndex
plot(VegIndex)

But the vegetation distances on the x-axis in the graph are showing up as: 
[-5,0] (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] 

I am concerned these vegetation classes are not grouped probably and there
is overlap between the classes. It should read, preferably without brackets
or only one kind (): 
(-5-0) (1-5) (6-10) (11-15) (16-20) (21-25) (26-30)

How do I fix this?
Please advise, Jean



--
View this message in context: 
http://r.789695.n4.nabble.com/Creating-vegetation-distance-groups-from-one-column-tp4644970p4645127.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Anova

2012-10-05 Thread Jhope

Hi R-listers, 

I am trying to do an ANOVA for the following scatterplot and received the
following error:

library(car)
scatterplot(HSuccess ~ Veg, 
data = data.to.analyze,
xlab = Vegetation border (m), 
ylab = Hatching success (%))

anova(HSuccess ~ Veg, data=data.to.analyze)

Error in UseMethod(anova) : 
  no applicable method for 'anova' applied to an object of class formula

I am wondering if there is a better way to do this?
Please advise, Jean




--
View this message in context: http://r.789695.n4.nabble.com/Anova-tp4645130.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in lmer: asMethod(object) : matrix is not symmetric [1, 2]

2012-10-05 Thread Olivier Blaser

Dear R Users,

I am having trouble with lmer. I am looking at recombinant versus non 
recombinant individuals. In the response variable recombinant 
individuals are coded as 1's and non-recombinant as 0's. I built a model 
with 2 fixed factors and 1 random effect. Sex (males/females) is the 
first fixed effect and sexual genotype (XY, YY, WX and WY) the second 
one. Sexual Genotype is nested in sex. XY and YY individuals are males 
and WX and WY females. I crossed 8 XY males with 8 WY females and 8 YY 
males with 8 WX females. Each cross corresponds to a family (i.e. family 
1 to 8 for XY x WY and family 9 to 16 for YY WX). For each family I have 
20 offspring. Family is nested in sexual genotype as a random factor.

My data looks like:

reps-factor(sample(c(0,1),640,replace=T,prob=c(0.95,0.05)))

sex-factor(rep(c(M,F),each=320))

geno-factor(rep(c(XY,YY,WY,WX),each=160))

fam-factor(rep(c(1:8,9:16,1:8,9:16),each=20))

dat-data.frame(reps,sex,geno,fam)

and I built the following model:

lmer(reps~sex+geno+(1|fam),data=dat,family=binomial)

and tried also

lmer(reps~sex/geno+(1|fam),data=dat,family=binomial)

but I keep getting this error:

asMethod(object) : matrix is not symmetric [1,2]

Does someone have an idea where the error might come from?

Thank you very much in advance!

Olivier

Sex-chromosome turnovers induced by deleterious mutation load

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] seasonal ARFIMA for fractional values for lag non-seasonal and seasonal lag

2012-10-05 Thread normah

I need help if anybody know how to analyse data by using seasonal arfima
method where the values for d seasonal and d non-seasonal are fractional.
pplease help.



--
View this message in context: 
http://r.789695.n4.nabble.com/seasonal-ARFIMA-for-fractional-values-for-lag-non-seasonal-and-seasonal-lag-tp4645131.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with colors in contour plot

2012-10-05 Thread Loukia Spineli

Thank you very much Jim!!!

Your function achieves to color the grid tha way I want, but it can handle
only one variable, if I have understood correctly the details of this
function.The plot must display in the axes percentages and in the grids
pvalues. I have incorporated 3 different variables in a contourplot.

On Fri, Oct 5, 2012 at 12:08 PM, Jim Lemon j...@bitwrit.com.au wrote:

 On 10/04/2012 10:25 PM, Loukia Spineli wrote:

 Dear R users,

 I have a 51 by 51 matrix of p-values (named as pvalue_MA). I want to
 present graphically this matrix in a plot (filled contour plot) where both
 axes represent probabilities. I have also added a grid in this plot. I
 want
 to highlight in white the cells of the grid that represent p-values
 smaller
 than the (common) significance threshold, 0.05. The code from this plot is
 colored in blue (I had to copy-pasted all this code in order to reach to
 the plot). I suspect that the problem might be in the col parameter of
 the filled.contour function. Honestly, I cannot understand why the plot
 appears to be completely white!! I checked the values of my variable and
 it
 has a great range from 0 to 1.
 Any suggestion/ comment would really help me to move on with my project.

  Hi Loukia,
 Do you really need a contour plot for this?

 pvals-matrix(runif(2601),**nrow=51)
 library(plotrix)
 pcols-ifelse(pvals=0.05,**white,gray)
 color2D.matplot(pvals,**cellcolors=pcols)

 Jim


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with convergence in mle2/optim function

2012-10-05 Thread Berend Hasselman


On 05-10-2012, at 07:12, Adam Zeilinger wrote:

 Hello R Help,
 
 I am trying solve an MLE convergence problem: I would like to estimate four 
 parameters, p1, p2, mu1, mu2, which relate to the probabilities, P1, P2, P3, 
 of a multinomial (trinomial) distribution.  I am using the mle2() function 
 and feeding it a time series dataset composed of four columns: time point, 
 number of successes in category 1, number of successes in category 2, and 
 number of success in category 3.  The column headers are: t, n1, n2, and n3.
 
 The mle2() function converges occasionally, and I need to improve the rate of 
 convergence when used in a stochastic simulation, with multiple 
 stochastically generated datasets.  When mle2() does not converge, it returns 
 an error: Error in optim(par = c(2, 2, 0.001, 0.001), fn = function (p) : 
 L-BFGS-B needs finite values of 'fn'.  I am using the L-BFGS-B optimization 
 method with a lower box constraint of zero for all four parameters.  While I 
 do not know any theoretical upper limit(s) to the parameter values, I have 
 not seen any parameter estimates above 2 when using empirical data.  It seems 
 that when I start with certain 'true' parameter values, the rate of 
 convergence is quite high, whereas other true parameter values are very 
 difficult to estimate.  For example, the true parameter values p1 = 2, p2 = 
 2, mu1 = 0.001, mu2 = 0.001 causes convergence problems, but the parameter 
 values p1 = 0.3, p2 = 0.3, mu1 = 0.08, mu2 = 0.08 lead to high convergence 
 rate.  I've chose!
 n these two sets of values because they represent the upper and lower 
estimates of parameter values derived from graphical methods.
 
 First, do you have any suggestions on how to improve the rate of convergence 
 and avoid the finite values of 'fn' error?  Perhaps it has to do with the 
 true parameter values being so close to the boundary?  If so, any suggestions 
 on how to estimate parameter values that are near zero?
 
 Here is reproducible and relevant code from my stochastic simulation:
 
 
 library(bbmle)
 library(combinat)
 
 # define multinomial distribution
 dmnom2 - function(x,prob,log=FALSE) {
  r - lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
  if (log) r else exp(r)
 }
 
 # vector of time points
 tv - 1:20
 
 # Negative log likelihood function
 NLL.func - function(p1, p2, mu1, mu2, y){
  t - y$tv
  n1 - y$n1
  n2 - y$n2
  n3 - y$n3
  P1 - (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))) -
exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2*t)*mu2*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)/
exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2
  P2 - (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))) -
exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2*t)*mu1*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)/
exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2
  P3 - 1 - P1 - P2
  p.all - c(P1, P2, P3)
  -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE))
 }
 
 ## Generate simulated data
 # Model equations as expressions,
 P1 - expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
  mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))) -
  exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
  mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
  2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2*t)*mu2*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)/
  exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)
 
 P2 - expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
  mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 +

Re: [R] Anova


Hello,

You're thinking of ?aov. anova() does _not_ have a formula interface, it 
would be


anova(lm(HSuccess ~ Veg, data = data.to.analyze))

or

aov(HSuccess ~ Veg, data = data.to.analyze)

Hope this helps,

Rui Barradas
Em 05-10-2012 09:27, Jhope escreveu:

Hi R-listers,

I am trying to do an ANOVA for the following scatterplot and received the
following error:

library(car)
scatterplot(HSuccess ~ Veg,
 data = data.to.analyze,
 xlab = Vegetation border (m),
 ylab = Hatching success (%))

anova(HSuccess ~ Veg, data=data.to.analyze)

Error in UseMethod(anova) :
   no applicable method for 'anova' applied to an object of class formula

I am wondering if there is a better way to do this?
Please advise, Jean




--
View this message in context: http://r.789695.n4.nabble.com/Anova-tp4645130.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Anova

2012-10-05 Thread John Fox

Dear Jean,

On Fri, 5 Oct 2012 01:27:55 -0700 (PDT)
 Jhope jeanwaij...@gmail.com wrote:
 Hi R-listers, 
 
 I am trying to do an ANOVA for the following scatterplot and received the
 following error:
 
 library(car)
 scatterplot(HSuccess ~ Veg, 
 data = data.to.analyze,
 xlab = Vegetation border (m), 
 ylab = Hatching success (%))
 
 anova(HSuccess ~ Veg, data=data.to.analyze)
 
 Error in UseMethod(anova) : 
   no applicable method for 'anova' applied to an object of class formula
 
 I am wondering if there is a better way to do this?

anova() needs a model object, not a formula, as its first argument:

anova(lm(HSuccess ~ Veg, data=data.to.analyze))

Alternatively, you can use aov(), with summary(), to get the ANOVA table:

summary(aov(HSuccess ~ Veg, data=data.to.analyze))

I hope this helps,
 John


John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

 Please advise, Jean
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Anova-tp4645130.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating vegetation distance groups from one column

Hi,
Try this:
data.to.analyze-read.table(text=
Area Veg
456   0

3400  10

79  25

56   18

467 4

67  7

839    30

1120  16

3482  32
,sep=,header=TRUE)
data.to.analyze$VegIndex=cut(data.to.analyze 
  $Veg,breaks=c(-5,0,5,10,15,20,25,30,35),
    labels=c((-5-0), (1-5), (6-10), (11-15), 
(16-20),(21-25),(26-30),(31-35)))
VegIndex - data.to.analyze$VegIndex
plot(VegIndex)
A.K.




- Original Message -
From: Jhope jeanwaij...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, October 5, 2012 3:38 AM
Subject: Re: [R] Creating vegetation distance groups from one column

Thank you! That has worked for me when creating graphs. In plyr I used the
script:

# Veg Index 
data.to.analyze$VegIndex - cut(data.to.analyze$Veg,
      breaks=seq(0, 35, 5), include.lowest=TRUE)
VegIndex - data.to.analyze$VegIndex
plot(VegIndex)

But the vegetation distances on the x-axis in the graph are showing up as: 
[-5,0] (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] 

I am concerned these vegetation classes are not grouped probably and there
is overlap between the classes. It should read, preferably without brackets
or only one kind (): 
(-5-0) (1-5) (6-10) (11-15) (16-20) (21-25) (26-30)

How do I fix this?
Please advise, Jean



--
View this message in context: 
http://r.789695.n4.nabble.com/Creating-vegetation-distance-groups-from-one-column-tp4644970p4645127.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data structure for plsr

2012-10-05 Thread Bjørn-Helge Mevik

Emma Jones evjo...@ualberta.ca writes:

 My current data structure consists of a .csv
 file read into R containing 15 columns (a charcoal dilution series going
From 100% to 0%) and 1050 rows of absorbance data from 400 nm to 2500 nm at
  2 nm interval. I think I need to transpose the data such that the specific
 wavelengths become my columns and dilutions are defined in rows,

Yes, you need to transpose the data so a coloumn corresponds to a
variable (response or predictor).

 Should I (and how do I) make my absorbance data into
 individual matrices that read into a data frame with only two columns

It is best to put all predictors (wavelengths) together in one matrix,
yes.  The same for the responses, if you have more than one response
coloumn.

This is untested, so there might be errors:

Assuming that your spectroscopic data is read into a data frame called
origspec:

## This should create a matrix with the wavelengths as coloumns:
spec - t(as.matrix(origspec))

I don't know what your response is, so I'm just assuming it is in a
vector called resp.

# This would create a data frame suitable for plsr():
mydata - data.frame(resp = resp, spec = I(spec))

Then you can analyse like this:

plsr(resp ~ spec, data = mydata, )

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add a new row in to an existing data set in R Language?

Hi Karthick,

Not sure if I understand your message correctly.  Are you saying that the SAS 
merge as shown in the image (Nabble) is correct?. The fourth row in the merged 
data (105 21JUN1980 21JAN2006) is incorrect. 

A.K.



- Original Message -
From: killerkarthick karthick@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, October 5, 2012 2:18 AM
Subject: Re: [R] How to add a new row in to an existing data set in R Language?

Thanks for your answer Arun,
That above image  shows the merge in SAS but its possible as same in R
please reply me let me know?



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-merge-data-set-in-R-Language-tp4644855p4645119.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating vegetation distance groups from one column


Hello,

That now seems to be a presentation problem. See this example:


x - 0:30 + runif(124)
y - cut(x, breaks = seq(0, 35, 5))

l - levels(y)
l1 - sub(\\], ), l[1])
l2 - as.numeric(sub(\\(([[:digit:]]+),.*, \\1, l[-1])) + 1
l3 - sub(.*,([[:digit:]]+).*, \\1, l[-1])
l.new - c(l1, paste0((, l2, ,, l3, )))
levels(y) - l.new

str(y)
barplot(table(y))


Instead of 'y' use data.to.analyze$VegIndex and it should give what you 
want.


Hope this helps,

Rui Barradas
Em 05-10-2012 08:38, Jhope escreveu:

Thank you! That has worked for me when creating graphs. In plyr I used the
script:

# Veg Index
data.to.analyze$VegIndex - cut(data.to.analyze$Veg,
   breaks=seq(0, 35, 5), include.lowest=TRUE)
VegIndex - data.to.analyze$VegIndex
plot(VegIndex)

But the vegetation distances on the x-axis in the graph are showing up as:
[-5,0] (0,5] (5,10] (10,15] (15,20] (20,25] (25,30]

I am concerned these vegetation classes are not grouped probably and there
is overlap between the classes. It should read, preferably without brackets
or only one kind ():
(-5-0) (1-5) (6-10) (11-15) (16-20) (21-25) (26-30)

How do I fix this?
Please advise, Jean



--
View this message in context: 
http://r.789695.n4.nabble.com/Creating-vegetation-distance-groups-from-one-column-tp4644970p4645127.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] avoid - in specific case

2012-10-05 Thread Berry Boessenkool



Hi all,

I improved a function drawing horizontal histograms (code below) which uses 
barplot and works fine.
It does,however, need to assign a function to the global environment to later 
find the actual location on the vertical axis, and not the number of bars used 
by barplot.
Hopefully, running the examples below will illustrate that.

As said, it works perfectly fine and does exactly what I want.
The problem arises now that I'm building a package (just for myself, for now) 
and run into R CMD 
check telling me 'no visible binding for '-' assignment'. 
(wording below)
Similar problem as in 
http://r.789695.n4.nabble.com/R-CMD-check-tells-me-no-visible-binding-for-global-variable-what-does-it-mean-td1837236.html

My question is:
Is there a way to avoid assigning the function to the global environment with 
- but still have it available? I know it is generally not good practice.
Or ist it OK in a case like this, and is there a way to avoid the notes from 
the rcmd check (make the function package-compatible)?
Or should I just ignore these notes? (I'm completely new to building packages 
and cannot judge the importance yet.)

I'd be very thankful for any hints!

Berry

PS: I recently read about barcharts in lattice, but by now I'm already used to 
my function. (And I learned a lot writing it a couple of years ago).

 
# Function
horiz.hist - function(Data, breaks=Sturges, col=transparent, las=1, 
ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat, border=par(fg), 
... )
  {a - hist(Data, plot=FALSE, breaks=breaks)
  HBreaks - a$breaks
  HBreak1 - a$breaks[1]
  hpos - function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/ 
diff(range(HBreaks)) # Assign a function to the global environment with values 
calculated inside the main function.
  barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col, 
border=border,...)  
  axis(2, at=hpos(labelat), labels=labels, las=las, ...) 
  print(use hpos() to address y-coordinates) }

# Data and basic concept
set.seed(8); ExampleData - rnorm(50,8,5)+5
hist(ExampleData)
horiz.hist(ExampleData, xlab=absolute frequency) 
# Caution: the labels at the y-axis are not the real coordinates!
# abline(h=2) will draw above the second bar, not at the label value 2. Use 
hpos:
abline(h=hpos(11), col=2)

# Further arguments
horiz.hist(ExampleData, xlim=c(-8,20)) 
horiz.hist(ExampleData, main=the ... argument worked!, col.axis=3) 
hist(ExampleData, xlim=c(-10,40)) # with xlim
horiz.hist(ExampleData, ylim=c(-10,40), border=red) # with ylim
horiz.hist(ExampleData, breaks=20, col=orange)
axis(2, hpos(0:10), labels=F, col=2) # another use of hpos()

# One shortcoming: doesn't work with breakpoints provided as a vector with 
different widths of the bars


Wording from the rcmd check when building a package:
* checking R code for possible problems ... NOTE
horiz.hist: no visible binding for '-' assignment to 'hpos'
horiz.hist: no visible global function definition for 'hpos'


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoid - in specific case

2012-10-05 Thread Duncan Murdoch


On 05/10/2012 8:19 AM, Berry Boessenkool wrote:


Hi all,

I improved a function drawing horizontal histograms (code below) which uses 
barplot and works fine.
It does,however, need to assign a function to the global environment to later 
find the actual location on the vertical axis, and not the number of bars used 
by barplot.
Hopefully, running the examples below will illustrate that.

As said, it works perfectly fine and does exactly what I want.
The problem arises now that I'm building a package (just for myself, for now) 
and run into R CMD
check telling me 'no visible binding for '-' assignment'.
(wording below)
Similar problem as in 
http://r.789695.n4.nabble.com/R-CMD-check-tells-me-no-visible-binding-for-global-variable-what-does-it-mean-td1837236.html

My question is:
Is there a way to avoid assigning the function to the global environment with 
- but still have it available? I know it is generally not good practice.


You can return the function as the value of your function.   A bonus:  
if it is created within the body of your function, it will have access 
to all the local variables there.


You shouldn't write to the global environment, because globalenv belongs 
to the user, not to you.  If the user wants your function in the global 
environment s/he can just assign the value of your function to a 
variable there.


Duncan Murdoch



Or ist it OK in a case like this, and is there a way to avoid the notes from 
the rcmd check (make the function package-compatible)?
Or should I just ignore these notes? (I'm completely new to building packages 
and cannot judge the importance yet.)

I'd be very thankful for any hints!

Berry

PS: I recently read about barcharts in lattice, but by now I'm already used to 
my function. (And I learned a lot writing it a couple of years ago).

  
# Function

horiz.hist - function(Data, breaks=Sturges, col=transparent, las=1,
ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat, border=par(fg), 
... )
   {a - hist(Data, plot=FALSE, breaks=breaks)
   HBreaks - a$breaks
   HBreak1 - a$breaks[1]
   hpos - function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/ 
diff(range(HBreaks)) # Assign a function to the global environment with values 
calculated inside the main function.
   barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col, 
border=border,...)
   axis(2, at=hpos(labelat), labels=labels, las=las, ...)
   print(use hpos() to address y-coordinates) }

# Data and basic concept
set.seed(8); ExampleData - rnorm(50,8,5)+5
hist(ExampleData)
horiz.hist(ExampleData, xlab=absolute frequency)
# Caution: the labels at the y-axis are not the real coordinates!
# abline(h=2) will draw above the second bar, not at the label value 2. Use 
hpos:
abline(h=hpos(11), col=2)

# Further arguments
horiz.hist(ExampleData, xlim=c(-8,20))
horiz.hist(ExampleData, main=the ... argument worked!, col.axis=3)
hist(ExampleData, xlim=c(-10,40)) # with xlim
horiz.hist(ExampleData, ylim=c(-10,40), border=red) # with ylim
horiz.hist(ExampleData, breaks=20, col=orange)
axis(2, hpos(0:10), labels=F, col=2) # another use of hpos()

# One shortcoming: doesn't work with breakpoints provided as a vector with 
different widths of the bars


Wording from the rcmd check when building a package:
* checking R code for possible problems ... NOTE
horiz.hist: no visible binding for '-' assignment to 'hpos'
horiz.hist: no visible global function definition for 'hpos'



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-10-05 Thread user1234

Rui, 

Your response nearly answered a similar question of mine except that I also
have ecdfs of different lengths.  

Do you know how I can adjust  x - seq(min(loga, logb), max(loga, logb),
length.out=length(loga)) 
to account for this?  It must be in length.out() but I'm unsure how to
proceed.

Any advice is much appreciated.

-L


Rui Barradas wrote
 Hello,
 
 Try the following.
 (i've changed the color of the first ecdf.)
 
 
 loga - log10(a+1) # do this
 logb - log10(b+1) # only once
 
 f.a - ecdf(loga)
 f.b - ecdf(logb)
 # (2) max distance D
 
 x - seq(min(loga, logb), max(loga, logb), length.out=length(loga))
 x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
 y0 - f.a(x0)
 y1 - f.b(x0)
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=blue)
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)
 ## alternatine, use standard R plot of ecdf
 #plot(f.a, col=blue)
 #lines(f.b, col=green)
 
 points(c(x0, x0), c(y0, y1), pch=16, col=red)
 segments(x0, y0, x0, y1, col=red, lty=dotted)
 ## alternative, down to x axis
 #segments(x0, 0, x0, y1, col=red, lty=dotted)
 
 
 Hope this helps,
 
 Rui Barradas
 maxbre wrote
 Hi all, 
 
 given this example 
 
 #start 
 
 a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 

 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
  
 length(a)
 
 b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 
  3220,490,20790,290,740,5350,940,3910,0,640,850,260) 
 length(b)
 
 out-ks.test(log10(a+1),log10(b+1)) 
 
 # max distance D 
 out$statistic 
 
 f.a-ecdf(log10(a+1)) 
 f.b-ecdf(log10(b+1)) 
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=red) 
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) 
 
 #inverse of ecdf a
 x.a-get(x, environment(f.a))
 y.a-get(y, environment(f.a))
 
 # inverse of ecdf b
 x.b-get(x, environment(f.b))
 y.b-get(y, environment(f.b))
 
 
 #end
 
 I want to plot the max distance between the two ecdf curves as in the
 above given chart
 
 Is that possible and how? 
 
 
 Thanks for your help
 
 PS: this is an amended version of a previous thread (but no reply
 followed) that I’ve deleted from Nabble repository because I realised it
 was not enough clear (now I hope it’s a little better, sorry for that)





--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glm (probit/logit) optimizer

2012-10-05 Thread Dimitris.Kapetanakis

Dear all, 

I am using glm function in order to estimate a logit model i.e. glm(Y ~
data[,2] + data[,3], family = binomial(link = logit)).

I also created a function that estimates logit model and I would like it to
compare it with the glm function. 

So, does anyone know what optimizer or optimization method glm uses in order
to derive the result?

Thank you

Dimitris



--
View this message in context: 
http://r.789695.n4.nabble.com/glm-probit-logit-optimizer-tp4645141.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculating the mean in one column with empty cells

2012-10-05 Thread fxen3k

Hi all,

I recently tried to calculate the mean and the median just for one column.
In this column I have numbers with some empty cells due to missing data. 
So how can I calculate the mean just for the filled cells? 
I tried:
mean(dataSet2$ac_60d_4d_after_ann[!is.na(master$ac_60d_4d_after_ann)],
na.rm=TRUE)
But the output was different to the calculation I died in Microsoft Excel.

Thanks in advance,
Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating all possible ratios

2012-10-05 Thread José Carlos Nogueira Cavalcante Filho


Hello,

Comment out the second apply and all following instructions using 'r2'. 
In the end return 'r1', not cbind.


Hope this helps,

Rui Barradas
Em 04-10-2012 23:38, genome1976 escreveu:

Hi Rui,
  
A while ago you helped me with calculaing all possible ratios from a dataset.
  
This is the code I am using as suggested by you.
  
data - read.table(new_data.txt, header=T, row.names=1, sep=\t)

pairwise.ratios - function(x, prefix=probeset, char=:){
 n - ncol(x)
 cn - colnames(x)
 if(length(cn) == 0){
 cn - gsub( , 0, formatC(seq.int(n), width=nchar(n)))
 cn - paste(prefix, cn, sep=)
 }
 cmb - combn(n, 2)
 r1 - apply(cmb, 2, function(j) x[, j[1]]/x[, j[2]])
 r2 - apply(cmb, 2, function(j) x[, j[2]]/x[, j[1]])
 colnames(r1) - apply(cmb, 2, function(j) paste(cn[j], collapse=char))
 colnames(r2) - apply(cmb, 2, function(j) paste(cn[rev(j)], 
collapse=char))
 cbind(r1, r2)[, order(c(colnames(r1), colnames(r2)))]
}
results - pairwise.ratios(data.t)
write.table(t(results), ratios_results.txt, sep=\t)
  
It works perfectly fine only that it gives both pairs of ratios a:b and b:a for any two variables a and b.

Can you suggest me a way so that I get only one ratio and not both (Combination 
with caring for the order and not Permutation??)
  
Thanks for any help.
  
Best Regards,

Som.

  




Date: Sat, 12 May 2012 15:20:52 -0700
From: ml-node+s789695n4629656...@n4.nabble.com
To: genome1...@hotmail.com
Subject: RE: Calculating all possible ratios

Hello,

Nothing wrong with me, maybe your R session has some conflicting objects.
Running the function in the previous post on the first 4 rows and first 6 columns 
of your dataset the result was (copypaste to your session)

result - structure(c(8.74714923153198, 1.83094400392095, 9.92065138471113,
1.77145415014708, 1.01515180575001, 0.167175438316099, 0.222321656865252,
0.155576771874649, 3.09417748158541, 0.469647988505747, 1.29398633565582,
0.524043736521509, 3.75969597954255, 0.422694576901317, 9.75471698113208,
0.290397651827521, 4.9035575319622, 1.00105273231888, 1.01093964697178,
0.26895145631068, 0.114322960947685, 0.546166347992352, 0.100799832714726,
0.564507977763338, 0.11605516024473, 0.0913055986191245, 0.0224099858208782,
0.0878243288779063, 0.353735531392494, 0.256505926724138, 0.130433606169248,
0.295826869963301, 0.42981957664441, 0.230861553382365, 0.983273839877614,
0.163931791180376, 0.56058921623124, 0.546741314958369, 0.10190254729944,
0.151825242718447, 0.9850743448771, 5.98173996175908, 4.49798734905118,
6.4276947512815, 8.61659229879359, 10.9522309159971, 44.622964422,
11.3863665430362, 3.04799485560622, 2.8093121408046, 5.82033416762497,
3.36839317468124, 3.70358005398494, 2.52844904226946, 43.8765935747068,
1.86658746243623, 4.83036872336483, 5.98803713273998, 4.5471937427,
1.72873786407767, 0.323187666496628, 2.12925430210325, 0.772805687699305,
1.90823767237023, 2.82697074863659, 3.89854539725884, 7.66673581578674,
3.38035554418724, 0.328084543240185, 0.35595902124055, 0.1718114409242,
0.296877457036954, 1.21508737036511, 0.900024246342843, 7.53850076491586,
0.554147739185128, 1.58476931628683, 2.13149583692219, 0.781259909100518,
0.513223300970874, 0.265978952936953, 2.36577437858509, 0.102514506769826,
3.44355401535389, 2.32655759378615, 4.33160041310018, 1.01701068353905,
6.10009805175427, 0.270009014365446, 0.395499368696959, 0.0227911949977918,
0.535737017484743, 0.822986086753186, 1.11108117816092, 0.132652370966651,
1.8045729131197, 1.30424309801742, 2.36826490573261, 0.103635979283374,
0.926148867313916, 0.203933571388086, 0.998948374760994, 0.989178733859585,
3.71814309436142, 1.78383738225087, 1.82901853699522, 9.81329737579089,
6.58652001534723, 0.207023533247665, 0.166999632405824, 0.219915855047535,
0.578456699988768, 0.631006664328306, 0.469154094827586, 1.27998376513563,
1.9484696000908, 0.76672822844154, 0.422250060615857, 9.64915859255482,
1.07974002376127), .Dim = c(4L, 30L), .Dimnames = list(c(S1,
S2, S3, S4), c(P1:P2, P1:P3, P1:P4, P1:P5, P1:P6,
P2:P1, P2:P3, P2:P4, P2:P5, P2:P6, P3:P1, P3:P2,
P3:P4, P3:P5, P3:P6, P4:P1, P4:P2, P4:P3, P4:P5,
P4:P6, P5:P1, P5:P2, P5:P3, P5:P4, P5:P6, P6:P1,
P6:P2, P6:P3, P6:P4, P6:P5)))

Rui Barradas




genome1976 wrote
Hi Rui,
Thanks once again. I really appreciate it.
I tried using the code with the following dataset:




   
   
   Sample

   P1
   P2
   P3
   P4
   P5
   P6
   P7
   P8
   P9
   P10
   
   
   S1

   5292.9
   605.1
   5213.9
   1710.6
   1407.8
   1079.4
   1379.6
   9321.4
   6951
   1205.8
   
   
   S2

   104.6
   57.129
   625.69
   222.72
   247.46
   104.49
   330.29
   1863.7
   389.67
   216.29
   
   
   S3

   191.29
   19.282
   860.42
   147.83
   19.61
   189.22
   203.27
   1799
   369.9
   175.73
   
   
   S4

   41.553
   23.457
   267.09
   79.293
   143.09
   154.5
   52.567
   613.54
   408.86
   61.715
   
   
   S5

[R] BCA Package

2012-10-05 Thread kokila

Hi...In R rankscore()  function How to calculate estimation  validation 
holdout?Iam waiting for replyPlease Reply.



--
View this message in context: 
http://r.789695.n4.nabble.com/BCA-Package-tp4645146.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Setting the desired reference category with contr.sum

2012-10-05 Thread autarkie

Hi,

I have 6 career types, represented as a factor in R, coded from 1 to 6. I
need to use the effect coding (also known as deviation coding) which is
normally done by contr.sum, e.g. 

contrasts(career) - contr.sum(6)

However, this results in the 6th category being the reference, that is being
coded as -1: 

$contrasts
  [,1] [,2] [,3] [,4] [,5]
110000
201000
300100
400010
500001
6   -1   -1   -1   -1   -1

In fact, I need number 1 to be the reference. How do I achieve that? Thank
you. 

Regards,
Maxim



--
View this message in context: 
http://r.789695.n4.nabble.com/Setting-the-desired-reference-category-with-contr-sum-tp4645150.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bayesian inference distribution parameters

Dear all,

I'm having a hard time trying to estimate in a bayesian way the parameters
alpha and beta for a beta distributed set of data (frequency of
occurrence). Do you have any suggestions or tutorials on how to estimate
this?

Best regards,

JC

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm (probit/logit) optimizer

Dimitris.Kapetanakis dimitrios.kapetanakis at gmail.com writes:

 I am using glm function in order to estimate a logit model i.e. glm(Y ~
 data[,2] + data[,3], family = binomial(link = logit)).
 
 I also created a function that estimates logit model and I would like it to
 compare it with the glm function. 
 
 So, does anyone know what optimizer or optimization method glm uses in order
 to derive the result?

  Iteratively reweighted least squares.

  If you need the gory details, look at the code of glm.fit() by typing

glm.fit

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves


Hello,

Try length.out = max(length(loga), length(logb))

Note also that all of the previous code and the line above assumes that 
we are interested in the max distance, whereas the KS statistic computes 
the supremum of the distance. If it's a two sample test then their 
values are almost surely the same but not if it's a one sample test.


Hope this helps,

Rui Barradas
Em 05-10-2012 12:15, user1234 escreveu:

Rui,

Your response nearly answered a similar question of mine except that I also
have ecdfs of different lengths.

Do you know how I can adjust  x - seq(min(loga, logb), max(loga, logb),
length.out=length(loga))
to account for this?  It must be in length.out() but I'm unsure how to
proceed.

Any advice is much appreciated.

-L


Rui Barradas wrote

Hello,

Try the following.
(i've changed the color of the first ecdf.)


loga - log10(a+1) # do this
logb - log10(b+1) # only once

f.a - ecdf(loga)
f.b - ecdf(logb)
# (2) max distance D

x - seq(min(loga, logb), max(loga, logb), length.out=length(loga))
x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
y0 - f.a(x0)
y1 - f.b(x0)

plot(f.a, verticals=TRUE, do.points=FALSE, col=blue)
plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)
## alternatine, use standard R plot of ecdf
#plot(f.a, col=blue)
#lines(f.b, col=green)

points(c(x0, x0), c(y0, y1), pch=16, col=red)
segments(x0, y0, x0, y1, col=red, lty=dotted)
## alternative, down to x axis
#segments(x0, 0, x0, y1, col=red, lty=dotted)


Hope this helps,

Rui Barradas
maxbre wrote

Hi all,

given this example

#start

a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,

760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)

length(a)

b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
  3220,490,20790,290,740,5350,940,3910,0,640,850,260)
length(b)

out-ks.test(log10(a+1),log10(b+1))

# max distance D
out$statistic

f.a-ecdf(log10(a+1))
f.b-ecdf(log10(b+1))

plot(f.a, verticals=TRUE, do.points=FALSE, col=red)
plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)

#inverse of ecdf a
x.a-get(x, environment(f.a))
y.a-get(y, environment(f.a))

# inverse of ecdf b
x.b-get(x, environment(f.b))
y.b-get(y, environment(f.b))


#end

I want to plot the max distance between the two ecdf curves as in the
above given chart

Is that possible and how?


Thanks for your help

PS: this is an amended version of a previous thread (but no reply
followed) that I’ve deleted from Nabble repository because I realised it
was not enough clear (now I hope it’s a little better, sorry for that)





--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2012-10-05 Thread Jean V Adams

DL,

Looks like you have a typo in the expression() function.
You had only one s.
This works for me ...

m - 10
beta.q - matrix(rnorm(81), ncol=9)

for (i in 1:9){ 
   plot(seq(1/m, 1-1/m, 1/m), beta.q[, i], type=l, col=1, 
  ylim=range(beta.q), xlab=quantile, ylab=expression(beta[i])) 
   } 

Jean



dick liang dickliang...@gmail.com wrote on 10/04/2012 08:12:07 AM:
 
 producing a multi-figure plot, i am try to add beta_1, beta_2,.. beta_9 
to
 ylab using expression or substitution, but cannot work out like
 
 for (i in 1:9){
plot(seq(1/m, 1-1/m, 1/m), beta.q[,i], type=l, col=1,
ylim=range(beta.q),
xlab=quantile, ylab=expresion(beta[i]))
 }
 
 any suggestions will be greatly appreciated.
 
 DL

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

Hi

see inline

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of fxen3k
 Sent: Friday, October 05, 2012 11:28 AM
 To: r-help@r-project.org
 Subject: [R] Calculating the mean in one column with empty cells

 Hi all,

 I recently tried to calculate the mean and the median just for one
 column.
 In this column I have numbers with some empty cells due to missing
 data.
 So how can I calculate the mean just for the filled cells?
 I tried:
 mean(dataSet2$ac_60d_4d_after_ann[!is.na(master$ac_60d_4d_after_ann)],
 na.rm=TRUE)

mean(dataSet2$ac_60d_4d_after_ann, na.rm=TRUE)

shall suffice.

 But the output was different to the calculation I died in Microsoft
 Excel.

Hm. I am also sometimes dying from Excel performance.

There could be 2 options:
Excel is wrong
You did not have transferred values from Excel to R correctly, they are screwed 
somehow.

Which one is true is difficult to decide based on information you revealed.

Regards
Petr

 Thanks in advance,
 Felix

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-
 empty-cells-tp4645135.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread Berend Hasselman


On 05-10-2012, at 11:27, fxen3k wrote:

 Hi all,
 
 I recently tried to calculate the mean and the median just for one column.
 In this column I have numbers with some empty cells due to missing data. 
 So how can I calculate the mean just for the filled cells? 
 I tried:
 mean(dataSet2$ac_60d_4d_after_ann[!is.na(master$ac_60d_4d_after_ann)],
 na.rm=TRUE)
 But the output was different to the calculation I died in Microsoft Excel.
 

No data == no can answer question.

What did you expect?
What did Excel give you?

But you are trying to calculate the mean of  dataSet2$ac_60d_4d_after_ann and 
indexing with the indices of  non-NA numbers of  master$ac_60d_4d_after_ann.

mean(dataSet2$ac_60d_4d_after_ann, na.rm=TRUE) 

should do what you want.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread Ken Takagi

fxen3k f.sehardt at gmail.com writes:

 
 Hi all,
 
 I recently tried to calculate the mean and the median just for one column.
 In this column I have numbers with some empty cells due to missing data. 
 So how can I calculate the mean just for the filled cells? 
 I tried:
 mean(dataSet2$ac_60d_4d_after_ann[!is.na(master$ac_60d_4d_after_ann)],
 na.rm=TRUE)
 But the output was different to the calculation I died in Microsoft Excel.
 
 Thanks in advance,
 Felix
 
 --
 View this message in context:
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135.html
 Sent from the R help mailing list archive at Nabble.com.
 
 

Hi Felix,

Assuming that you have the data frame formatted properly, mean(yourdata$column,
na.rm = T) should work.  When coverting an excel table to R, I usually fill in
blank cells with NA before importing.  If you try to import an data frame with
empty cells, you usually get an error using read.table().  But since you seem to
have already got you data into R, that may not be the problem.

HTH,
Ken

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add a new row in to an existing data set in R Language?

Hi,

On Fri, Oct 5, 2012 at 12:27 AM, killerkarthick karthick@gmail.com wrote:
 Hi Ista Zahn,
  Thanks for your advice. Please see the
 following Image.




 http://r.789695.n4.nabble.com/file/n4645113/data.png

To me this example looks plain wrong. You end up with a row that
contains the date of birth for subject 104 and the date of informed
consent for subject 105.

If this really is the situation you face, I suggest correcting the ID
before merging.







  i am expecting the result should same in the image.

OK, then fix your ID numbers and use merge(DF1, DF2).

Best,
Ista





 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-add-a-new-row-in-to-an-existing-data-set-in-R-Language-tp4644855p4645113.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoid - in specific case

2012-10-05 Thread Sarah Goslee

Hi Berry,

You might look at scatterplot3d in the scatterplot3d package for an
example of how a similar problem was handled, precisely as Duncan
suggests.

Sarah

On Fri, Oct 5, 2012 at 8:25 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 On 05/10/2012 8:19 AM, Berry Boessenkool wrote:


 Hi all,

 I improved a function drawing horizontal histograms (code below) which
 uses barplot and works fine.
 It does,however, need to assign a function to the global environment to
 later find the actual location on the vertical axis, and not the number of
 bars used by barplot.
 Hopefully, running the examples below will illustrate that.

 As said, it works perfectly fine and does exactly what I want.
 The problem arises now that I'm building a package (just for myself, for
 now) and run into R CMD
 check telling me 'no visible binding for '-' assignment'.
 (wording below)
 Similar problem as in
 http://r.789695.n4.nabble.com/R-CMD-check-tells-me-no-visible-binding-for-global-variable-what-does-it-mean-td1837236.html

 My question is:
 Is there a way to avoid assigning the function to the global environment
 with - but still have it available? I know it is generally not good
 practice.


 You can return the function as the value of your function.   A bonus:  if it
 is created within the body of your function, it will have access to all the
 local variables there.

 You shouldn't write to the global environment, because globalenv belongs to
 the user, not to you.  If the user wants your function in the global
 environment s/he can just assign the value of your function to a variable
 there.

 Duncan Murdoch



 Or ist it OK in a case like this, and is there a way to avoid the notes
 from the rcmd check (make the function package-compatible)?
 Or should I just ignore these notes? (I'm completely new to building
 packages and cannot judge the importance yet.)

 I'd be very thankful for any hints!

 Berry

 PS: I recently read about barcharts in lattice, but by now I'm already
 used to my function. (And I learned a lot writing it a couple of years ago).

   # Function
 horiz.hist - function(Data, breaks=Sturges, col=transparent, las=1,
 ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat,
 border=par(fg), ... )
{a - hist(Data, plot=FALSE, breaks=breaks)
HBreaks - a$breaks
HBreak1 - a$breaks[1]
hpos - function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/
 diff(range(HBreaks)) # Assign a function to the global environment with
 values calculated inside the main function.
barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col,
 border=border,...)
axis(2, at=hpos(labelat), labels=labels, las=las, ...)
print(use hpos() to address y-coordinates) }

 # Data and basic concept
 set.seed(8); ExampleData - rnorm(50,8,5)+5
 hist(ExampleData)
 horiz.hist(ExampleData, xlab=absolute frequency)
 # Caution: the labels at the y-axis are not the real coordinates!
 # abline(h=2) will draw above the second bar, not at the label value 2.
 Use hpos:
 abline(h=hpos(11), col=2)

 # Further arguments
 horiz.hist(ExampleData, xlim=c(-8,20))
 horiz.hist(ExampleData, main=the ... argument worked!, col.axis=3)
 hist(ExampleData, xlim=c(-10,40)) # with xlim
 horiz.hist(ExampleData, ylim=c(-10,40), border=red) # with ylim
 horiz.hist(ExampleData, breaks=20, col=orange)
 axis(2, hpos(0:10), labels=F, col=2) # another use of hpos()

 # One shortcoming: doesn't work with breakpoints provided as a vector with
 different widths of the bars


 Wording from the rcmd check when building a package:
 * checking R code for possible problems ... NOTE
 horiz.hist: no visible binding for '-' assignment to 'hpos'
 horiz.hist: no visible global function definition for 'hpos'






-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Setting the desired reference category with contr.sum

Hi,

The contr. functions return matrices, so you can can create any
arbitrary matrix you like. For example

contrasts(career) - cbind(c1=c(-1, 1, 0, 0, 0, 0),
  c2=c(-1, 0, 1, 0, 0, 0),
  c3=c(-1, 0, 0, 1, 0, 0),
  c4=c(-1, 0, 0, 0, 1, 0),
  c5=c(-1, 0, 0, 0, 0, 1))

Best,
Ista

On Fri, Oct 5, 2012 at 8:01 AM, autarkie kovla...@hotmail.com wrote:
 Hi,

 I have 6 career types, represented as a factor in R, coded from 1 to 6. I
 need to use the effect coding (also known as deviation coding) which is
 normally done by contr.sum, e.g.

 contrasts(career) - contr.sum(6)

 However, this results in the 6th category being the reference, that is being
 coded as -1:

 $contrasts
   [,1] [,2] [,3] [,4] [,5]
 110000
 201000
 300100
 400010
 500001
 6   -1   -1   -1   -1   -1

 In fact, I need number 1 to be the reference. How do I achieve that? Thank
 you.

 Regards,
 Maxim



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Setting-the-desired-reference-category-with-contr-sum-tp4645150.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with making figures

2012-10-05 Thread Jean V Adams

It's helpful to provide reproducible code in your posting to R help.  The 
dput() function can be used to share some of your data.  For example, you 
might have used 
dput(mydata[1:10, 1:10])

# here's some data I made up as an example ...
df - structure(list(`2000` = c(44L, 31L, 55L, 83L, 39L, 12L, 21L, 
20L, 52L, 63L, 92L, 90L, 22L, 71L, 23L, 46L, 84L, 9L, 98L, 47L
), `2001` = c(88L, 11L, 61L, 86L, 6L, 78L, 97L, 70L, 10L, 72L, 
14L, 37L, 94L, 60L, 8L, 19L, 73L, 57L, 2L, 30L), `2002` = c(29L, 
87L, 56L, 17L, 4L, 95L, 3L, 77L, 53L, 24L, 79L, 48L, 59L, 42L, 
54L, 28L, 25L, 18L, 43L, 15L), `2003` = c(16L, 40L, 58L, 65L, 
13L, 38L, 76L, 41L, 1L, 66L, 32L, 45L, 5L, 51L, 33L, 82L, 68L, 
74L, 91L, 69L), `2004` = c(67L, 7L, 75L, 80L, 99L, 89L, 81L, 
93L, 62L, 85L, 64L, 35L, 100L, 34L, 50L, 49L, 27L, 96L, 36L, 
26L)), .Names = c(2000, 2001, 2002, 2003, 2004), row.names = 
c(Site A, 
Site B, Site C, Site D, Site E, Site F, Site G, Site H, 
Site I, Site J, Site K, Site L, Site M, Site N, Site O, 
Site P, Site Q, Site R, Site S, Site T), class = data.frame)

# transpose the data (switch columns and rows)
df.turned - as.data.frame(t(df))

# site names
sites - names(df.turned)

# years
year - as.numeric(dimnames(df.turned)[[1]])

# a separate plot for each site
for(i in seq(sites)) {
plot(year, df.turned[, i], type=b, xlab=Year, ylab=My data, 
main=sites[i])
}

Jean



megalops megalop...@hotmail.com wrote on 10/04/2012 03:01:17 PM:
 
 I need to make about 30 figures and I am trying to create a program in R 
that
 will make my life a lot easier.  First I will tell you how my data is 
setup. 
 I have 30 sites and then data for each year at the site.  I have 10 
years of
 data for each site.   Below is a small chunk of my data to show the 
 format. 
 2000   2001   2002   2003   2004
 Site A   50   75   25   55   60
 Site B   58   22   68   77   30
 
 I am trying to write a program in R that will create figures showing the
 annual data for each individual site.  As opposed to making 30 
individual
 graphs in Excel.  Any help would be greatly appreciated. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoid - in specific case

2012-10-05 Thread Frans Marcelissen

I agree with this. But there is (in my opinion) a much more dangerous point:
ab
If  there is an object  called b in the function, it uses this b. But if you
forgot to inialise b in the function, it looks in the global environment. So
the following happens:
b-100
a-b (if there is no b in the function, it uses b in the global environment.
So a=100
b9 (writes to b in the local environment)
Now b in the global environment is 100, while b in the function is 9
It would be more logical if a-b would give an error if b does not exist in
the function (just like pascal does).
---
Frans

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Duncan Murdoch
Verzonden: vrijdag 5 oktober 2012 14:26
Aan: Berry Boessenkool
CC: R Help
Onderwerp: Re: [R] avoid - in specific case

On 05/10/2012 8:19 AM, Berry Boessenkool wrote:

 Hi all,

 I improved a function drawing horizontal histograms (code below) which
uses barplot and works fine.
 It does,however, need to assign a function to the global environment to
later find the actual location on the vertical axis, and not the number of
bars used by barplot.
 Hopefully, running the examples below will illustrate that.

 As said, it works perfectly fine and does exactly what I want.
 The problem arises now that I'm building a package (just for myself, 
 for now) and run into R CMD check telling me 'no visible binding for '-'
assignment'.
 (wording below)
 Similar problem as in 
 http://r.789695.n4.nabble.com/R-CMD-check-tells-me-no-visible-binding-
 for-global-variable-what-does-it-mean-td1837236.html

 My question is:
 Is there a way to avoid assigning the function to the global environment
with - but still have it available? I know it is generally not good
practice.

You can return the function as the value of your function.   A bonus:  
if it is created within the body of your function, it will have access to
all the local variables there.

You shouldn't write to the global environment, because globalenv belongs to
the user, not to you.  If the user wants your function in the global
environment s/he can just assign the value of your function to a variable
there.

Duncan Murdoch


 Or ist it OK in a case like this, and is there a way to avoid the notes
from the rcmd check (make the function package-compatible)?
 Or should I just ignore these notes? (I'm completely new to building 
 packages and cannot judge the importance yet.)

 I'd be very thankful for any hints!

 Berry

 PS: I recently read about barcharts in lattice, but by now I'm already
used to my function. (And I learned a lot writing it a couple of years ago).

   
 # Function
 horiz.hist - function(Data, breaks=Sturges, col=transparent, 
 las=1, ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat,
border=par(fg), ... )
{a - hist(Data, plot=FALSE, breaks=breaks)
HBreaks - a$breaks
HBreak1 - a$breaks[1]
hpos - function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/
diff(range(HBreaks)) # Assign a function to the global environment with
values calculated inside the main function.
barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col,
border=border,...)
axis(2, at=hpos(labelat), labels=labels, las=las, ...)
print(use hpos() to address y-coordinates) }

 # Data and basic concept
 set.seed(8); ExampleData - rnorm(50,8,5)+5
 hist(ExampleData)
 horiz.hist(ExampleData, xlab=absolute frequency) # Caution: the 
 labels at the y-axis are not the real coordinates!
 # abline(h=2) will draw above the second bar, not at the label value 2.
Use hpos:
 abline(h=hpos(11), col=2)

 # Further arguments
 horiz.hist(ExampleData, xlim=c(-8,20)) horiz.hist(ExampleData, 
 main=the ... argument worked!, col.axis=3) hist(ExampleData, 
 xlim=c(-10,40)) # with xlim horiz.hist(ExampleData, ylim=c(-10,40), 
 border=red) # with ylim horiz.hist(ExampleData, breaks=20, 
 col=orange) axis(2, hpos(0:10), labels=F, col=2) # another use of 
 hpos()

 # One shortcoming: doesn't work with breakpoints provided as a vector 
 with different widths of the bars


 Wording from the rcmd check when building a package:
 * checking R code for possible problems ... NOTE
 horiz.hist: no visible binding for '-' assignment to 'hpos'
 horiz.hist: no visible global function definition for 'hpos'


   
   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__

Re: [R] Error in lmer: asMethod(object) : matrix is not symmetric [1, 2]

Olivier Blaser olivier.blaser at unil.ch writes:


 I am having trouble with lmer. I am looking at recombinant versus non 
 recombinant individuals. In the response variable recombinant 
 individuals are coded as 1's and non-recombinant as 0's. I built a model 
 with 2 fixed factors and 1 random effect. Sex (males/females) is the 
 first fixed effect and sexual genotype (XY, YY, WX and WY) the second 
 one. Sexual Genotype is nested in sex. XY and YY individuals are males 
 and WX and WY females. I crossed 8 XY males with 8 WY females and 8 YY 
 males with 8 WX females. Each cross corresponds to a family (i.e. family 
 1 to 8 for XY x WY and family 9 to 16 for YY WX). For each family I have 
 20 offspring. Family is nested in sexual genotype as a random factor.
 
 My data looks like:


reps-factor(sample(c(0,1),640,replace=T,prob=c(0.95,0.05)))
sex-factor(rep(c(M,F),each=320))
geno-factor(rep(c(XY,YY,WY,WX),each=160))
fam-factor(rep(c(1:8,9:16,1:8,9:16),each=20))
dat-data.frame(reps,sex,geno,fam)

with(dat,table(sex,geno))
with(dat,table(sex,geno,fam))

 Follow-ups to this question should go to r-sig-mixed-models

 The slightly obscure error message is caused by a glitch
in lme4's printing code, where it tries to print a model
that hasn't been successfully fitted in the first place.
I actually get a **slightly** more informative error message
(maybe with a slightly later version of the package):

g1 - glmer(reps~sex+geno+(1|fam),data=dat,family=binomial)
Error in mer_finalize(ans) : Downdated X'X is not positive definite, 1.

[although lmer() automatically calls glmer() for GLMMs, I
 prefer to use glmer() explicitly]

  To diagnose/show what the problem is, let's try fitting in glm.
It works, but the answer indicates that
your design matrix is rank-deficient/overfitted:

(g0 - glm(reps~sex+geno,data=dat,family=binomial))

i.e., the 'genoYY' estimate is NA, because once 'sexM',
'genoWY', and 'genoXY' parameters are fitted the model is
completely determined.

  Unlike [g]lm, [g]lmer can't handle rank-deficient/overfitted
fixed-effect design matrices (as I understand it, because the
somewhat fancier/more efficient linear algebra primitives used
in lme4 don't handle this case).  Therefore, you have to figure
out a way to code your design matrix that will work.  At the moment
I don't know of an elegant way to do that, but here's a hack:

## extract model matrix and drop intercept and genoYY dummy variables
mm - model.matrix(g0)[,2:4]  
dat2 -data.frame(reps,mm,fam)
g2 - glmer(reps~sexM+genoWY+genoXY+(1|fam),data=dat2,family=binomial)

  I don't know if this gives you the answer you want or not ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dúvida função Anova pacote car - Medidas repetidas

2012-10-05 Thread Diego Pujoni

Olá pessoal, estou realizando uma ANOVA com medidas repetidas e estou
utilizando a função Anova do pacote car.

Medi o biovolume de algas a cada dois dias durante 10 dias (no banco de
dados abaixo só coloquei até o 4° dia). Tenho 2 tratamentos (c,t) e o
experimento foi realizado em tréplicas (A,B,C).

 Pa2
   Day Type Replicate logbiovolume
10c A19.34
20c B18.27
30c C18.56
40t A18.41
50t B18.68
60t C18.86
72c A18.81
82c B18.84
92c C18.52
10   2t A18.29
11   2t B17.91
12   2t C17.67
13   4c A19.16
14   4c B18.85
15   4c C19.36
16   4t A19.05
17   4t B19.09
18   4t C18.26
.
.
.

Pa2.teste = within(Pa2,{group = factor(Type)
   time = factor(Day)
   id = factor(Replicate)})
matrix = with(Pa2.teste,cbind(Pa2[,VAR][group==c],Pa2[,VAR][group==t]))
matrix
   [,1]  [,2]
 [1,] 19.34 18.41
 [2,] 18.27 18.68
 [3,] 18.56 18.86
 [4,] 18.81 18.29
 [5,] 18.84 17.91
 [6,] 18.52 17.67
 [7,] 19.16 19.05
 [8,] 18.85 19.09
 [9,] 19.36 18.26
[10,] 19.63 18.96
[11,] 19.94 18.06
[12,] 19.54 18.37
[13,] 19.98 17.96
[14,] 20.99 17.93
[15,] 20.45 17.74
[16,] 21.12 17.60
[17,] 21.66 17.33
[18,] 21.51 18.12
 model - lm(matrix ~ 1)
 design - factor(c(c,t))

 options(contrasts=c(contr.sum, contr.poly))
 aov - Anova(model, idata=data.frame(design), idesign=~design, type=III)
 summary(aov, multivariate=F)

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

 SS num Df Error SS den Df FPr(F)
(Intercept) 12951.2  1   6.3312 17 34775.336  2.2e-16 ***
design 19.1  1  17.3901 1718.697 0.0004606 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1


O problema é que eu acho que esta função não está levando em consideração
os dias, nem as réplicas. Como faço para introduzir isto na análise. Vocês
conhecem alguma função correspondente não paramétrica para este teste? Tipo
um teste de Friedman com dois grupos (tratamento e réplica) e um bloco
(tempo)?

Muito Obrigado

   Diego PJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoid - in specific case

2012-10-05 Thread Duncan Murdoch


On 05/10/2012 9:38 AM, Frans Marcelissen wrote:

I agree with this. But there is (in my opinion) a much more dangerous point:
ab
If  there is an object  called b in the function, it uses this b. But if you
forgot to inialise b in the function, it looks in the global environment. So
the following happens:
b-100
a-b (if there is no b in the function, it uses b in the global environment.
So a=100
b9 (writes to b in the local environment)
Now b in the global environment is 100, while b in the function is 9
It would be more logical if a-b would give an error if b does not exist in
the function (just like pascal does).


If functions didn't have access to objects outside the local frame, they 
couldn't do much of anything at all.  Unlike Pascal, functions in R are 
first class objects.  Even - is an object (found in the base 
package).  So functions need to be able to look outside themselves to 
find the thinks like mean(), var(), +, -, etc.


Duncan Murdoch


---
Frans

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Duncan Murdoch
Verzonden: vrijdag 5 oktober 2012 14:26
Aan: Berry Boessenkool
CC: R Help
Onderwerp: Re: [R] avoid - in specific case

On 05/10/2012 8:19 AM, Berry Boessenkool wrote:

 Hi all,

 I improved a function drawing horizontal histograms (code below) which
uses barplot and works fine.
 It does,however, need to assign a function to the global environment to
later find the actual location on the vertical axis, and not the number of
bars used by barplot.
 Hopefully, running the examples below will illustrate that.

 As said, it works perfectly fine and does exactly what I want.
 The problem arises now that I'm building a package (just for myself,
 for now) and run into R CMD check telling me 'no visible binding for '-'
assignment'.
 (wording below)
 Similar problem as in
 http://r.789695.n4.nabble.com/R-CMD-check-tells-me-no-visible-binding-
 for-global-variable-what-does-it-mean-td1837236.html

 My question is:
 Is there a way to avoid assigning the function to the global environment
with - but still have it available? I know it is generally not good
practice.

You can return the function as the value of your function.   A bonus:
if it is created within the body of your function, it will have access to
all the local variables there.

You shouldn't write to the global environment, because globalenv belongs to
the user, not to you.  If the user wants your function in the global
environment s/he can just assign the value of your function to a variable
there.

Duncan Murdoch


 Or ist it OK in a case like this, and is there a way to avoid the notes
from the rcmd check (make the function package-compatible)?
 Or should I just ignore these notes? (I'm completely new to building
 packages and cannot judge the importance yet.)

 I'd be very thankful for any hints!

 Berry

 PS: I recently read about barcharts in lattice, but by now I'm already
used to my function. (And I learned a lot writing it a couple of years ago).


 # Function
 horiz.hist - function(Data, breaks=Sturges, col=transparent,
 las=1, ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat,
border=par(fg), ... )
{a - hist(Data, plot=FALSE, breaks=breaks)
HBreaks - a$breaks
HBreak1 - a$breaks[1]
hpos - function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/
diff(range(HBreaks)) # Assign a function to the global environment with
values calculated inside the main function.
barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col,
border=border,...)
axis(2, at=hpos(labelat), labels=labels, las=las, ...)
print(use hpos() to address y-coordinates) }

 # Data and basic concept
 set.seed(8); ExampleData - rnorm(50,8,5)+5
 hist(ExampleData)
 horiz.hist(ExampleData, xlab=absolute frequency) # Caution: the
 labels at the y-axis are not the real coordinates!
 # abline(h=2) will draw above the second bar, not at the label value 2.
Use hpos:
 abline(h=hpos(11), col=2)

 # Further arguments
 horiz.hist(ExampleData, xlim=c(-8,20)) horiz.hist(ExampleData,
 main=the ... argument worked!, col.axis=3) hist(ExampleData,
 xlim=c(-10,40)) # with xlim horiz.hist(ExampleData, ylim=c(-10,40),
 border=red) # with ylim horiz.hist(ExampleData, breaks=20,
 col=orange) axis(2, hpos(0:10), labels=F, col=2) # another use of
 hpos()

 # One shortcoming: doesn't work with breakpoints provided as a vector
 with different widths of the bars


 Wording from the rcmd check when building a package:
 * checking R code for possible problems ... NOTE
 horiz.hist: no visible binding for '-' assignment to 'hpos'
 horiz.hist: no visible global function definition for 'hpos'



[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide

Re: [R] avoid - in specific case

2012-10-05 Thread Bert Gunter

Inline.

On Fri, Oct 5, 2012 at 6:38 AM, Frans Marcelissen
frans.marcelis...@digipsy.nl wrote:
 I agree with this. But there is (in my opinion) a much more dangerous point:
 ab
 If  there is an object  called b in the function, it uses this b. But if you
 forgot to inialise b in the function, it looks in the global environment. So
 the following happens:
 b-100
 a-b (if there is no b in the function, it uses b in the global environment.
 So a=100
 b9 (writes to b in the local environment)
 Now b in the global environment is 100, while b in the function is 9
 It would be more logical if a-b would give an error if b does not exist in
 the function (just like pascal does).

Oy!
What you find logical others (like me) would find objectionable. You
have just asked to break R's entire scoping procedure. Do you really
think that is reasonable?

It is the duty of a programmer to learn and use the paradigms of the
language in which he/she programs. These will, of course, be different
from language to language -- appropriately  so. To complain that one
language does not follow the procedures of another, especially without
understanding the big picture of each language's design (which few
of us do) just does not make sense to me.

All IMHO, of course.

Cheers,
Bert


 ---
 Frans

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens Duncan Murdoch
 Verzonden: vrijdag 5 oktober 2012 14:26
 Aan: Berry Boessenkool
 CC: R Help
 Onderwerp: Re: [R] avoid - in specific case

 On 05/10/2012 8:19 AM, Berry Boessenkool wrote:

 Hi all,

 I improved a function drawing horizontal histograms (code below) which
 uses barplot and works fine.
 It does,however, need to assign a function to the global environment to
 later find the actual location on the vertical axis, and not the number of
 bars used by barplot.
 Hopefully, running the examples below will illustrate that.

 As said, it works perfectly fine and does exactly what I want.
 The problem arises now that I'm building a package (just for myself,
 for now) and run into R CMD check telling me 'no visible binding for '-'
 assignment'.
 (wording below)
 Similar problem as in
 http://r.789695.n4.nabble.com/R-CMD-check-tells-me-no-visible-binding-
 for-global-variable-what-does-it-mean-td1837236.html

 My question is:
 Is there a way to avoid assigning the function to the global environment
 with - but still have it available? I know it is generally not good
 practice.

 You can return the function as the value of your function.   A bonus:
 if it is created within the body of your function, it will have access to
 all the local variables there.

 You shouldn't write to the global environment, because globalenv belongs to
 the user, not to you.  If the user wants your function in the global
 environment s/he can just assign the value of your function to a variable
 there.

 Duncan Murdoch


 Or ist it OK in a case like this, and is there a way to avoid the notes
 from the rcmd check (make the function package-compatible)?
 Or should I just ignore these notes? (I'm completely new to building
 packages and cannot judge the importance yet.)

 I'd be very thankful for any hints!

 Berry

 PS: I recently read about barcharts in lattice, but by now I'm already
 used to my function. (And I learned a lot writing it a couple of years ago).


 # Function
 horiz.hist - function(Data, breaks=Sturges, col=transparent,
 las=1, ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat,
 border=par(fg), ... )
{a - hist(Data, plot=FALSE, breaks=breaks)
HBreaks - a$breaks
HBreak1 - a$breaks[1]
hpos - function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/
 diff(range(HBreaks)) # Assign a function to the global environment with
 values calculated inside the main function.
barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col,
 border=border,...)
axis(2, at=hpos(labelat), labels=labels, las=las, ...)
print(use hpos() to address y-coordinates) }

 # Data and basic concept
 set.seed(8); ExampleData - rnorm(50,8,5)+5
 hist(ExampleData)
 horiz.hist(ExampleData, xlab=absolute frequency) # Caution: the
 labels at the y-axis are not the real coordinates!
 # abline(h=2) will draw above the second bar, not at the label value 2.
 Use hpos:
 abline(h=hpos(11), col=2)

 # Further arguments
 horiz.hist(ExampleData, xlim=c(-8,20)) horiz.hist(ExampleData,
 main=the ... argument worked!, col.axis=3) hist(ExampleData,
 xlim=c(-10,40)) # with xlim horiz.hist(ExampleData, ylim=c(-10,40),
 border=red) # with ylim horiz.hist(ExampleData, breaks=20,
 col=orange) axis(2, hpos(0:10), labels=F, col=2) # another use of
 hpos()

 # One shortcoming: doesn't work with breakpoints provided as a vector
 with different widths of the bars


 Wording from the rcmd check when building a package:
 * checking R code for possible problems ... NOTE
 horiz.hist: no visible binding for '-' assignment to 'hpos'
 horiz.hist: no visible

Re: [R] Dúvida função Anova pacote car - Medidas repetidas

Trabalho de casa? Não fazemos trabalhos de casa.
Ao menos podia ter tentado correr esse código...

Rui Barradas
Em 05-10-2012 14:56, Diego Pujoni escreveu:
 Olá pessoal, estou realizando uma ANOVA com medidas repetidas e estou
 utilizando a função Anova do pacote car.

 Medi o biovolume de algas a cada dois dias durante 10 dias (no banco de
 dados abaixo só coloquei até o 4° dia). Tenho 2 tratamentos (c,t) e o
 experimento foi realizado em tréplicas (A,B,C).

 Pa2
 Day Type Replicate logbiovolume
 10c A19.34
 20c B18.27
 30c C18.56
 40t A18.41
 50t B18.68
 60t C18.86
 72c A18.81
 82c B18.84
 92c C18.52
 10   2t A18.29
 11   2t B17.91
 12   2t C17.67
 13   4c A19.16
 14   4c B18.85
 15   4c C19.36
 16   4t A19.05
 17   4t B19.09
 18   4t C18.26
 .
 .
 .

 Pa2.teste = within(Pa2,{group = factor(Type)
 time = factor(Day)
 id = factor(Replicate)})
 matrix = with(Pa2.teste,cbind(Pa2[,VAR][group==c],Pa2[,VAR][group==t]))
 matrix
 [,1]  [,2]
   [1,] 19.34 18.41
   [2,] 18.27 18.68
   [3,] 18.56 18.86
   [4,] 18.81 18.29
   [5,] 18.84 17.91
   [6,] 18.52 17.67
   [7,] 19.16 19.05
   [8,] 18.85 19.09
   [9,] 19.36 18.26
 [10,] 19.63 18.96
 [11,] 19.94 18.06
 [12,] 19.54 18.37
 [13,] 19.98 17.96
 [14,] 20.99 17.93
 [15,] 20.45 17.74
 [16,] 21.12 17.60
 [17,] 21.66 17.33
 [18,] 21.51 18.12
   model - lm(matrix ~ 1)
   design - factor(c(c,t))

   options(contrasts=c(contr.sum, contr.poly))
   aov - Anova(model, idata=data.frame(design), idesign=~design, type=III)
   summary(aov, multivariate=F)

 Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

   SS num Df Error SS den Df FPr(F)
 (Intercept) 12951.2  1   6.3312 17 34775.336  2.2e-16 ***
 design 19.1  1  17.3901 1718.697 0.0004606 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


 O problema é que eu acho que esta função não está levando em consideração
 os dias, nem as réplicas. Como faço para introduzir isto na análise. Vocês
 conhecem alguma função correspondente não paramétrica para este teste? Tipo
 um teste de Friedman com dois grupos (tratamento e réplica) e um bloco
 (tempo)?

 Muito Obrigado

 Diego PJ

   [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

HI,
Not sure how your dataset looks like:

set.seed(1)
dat1-data.frame(col1=c(sample(1:50,5,replace=TRUE),NA,sample(1:25,3,replace=TRUE),NA,sample(1:25,2,replace=TRUE)),col2=c(NA,NA,rnorm(8,15),NA,NA))
mean(dat1$col1[!is.na(dat1$col1)])
#[1] 20.1
 mean(dat1$col1,na.rm=TRUE)
#[1] 20.1

But, there is one problem that is obvious dataset2 and master.  Looks like 
you have two datasets.
A.K.






- Original Message -
From: fxen3k f.seha...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, October 5, 2012 5:27 AM
Subject: [R] Calculating the mean in one column with empty cells

Hi all,

I recently tried to calculate the mean and the median just for one column.
In this column I have numbers with some empty cells due to missing data. 
So how can I calculate the mean just for the filled cells? 
I tried:
mean(dataSet2$ac_60d_4d_after_ann[!is.na(master$ac_60d_4d_after_ann)],
na.rm=TRUE)
But the output was different to the calculation I died in Microsoft Excel.

Thanks in advance,
Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] jointModel error messages

2012-10-05 Thread Charles Graham

I contacted the package developer and that lead to me removing events at time
0 (or subjects with only 1 longitudinal measurement).  I then still had the
error message Can't fit a Cox model with 0 failures which I have managed
to avoid by adding 1.8*10^(-15) to all my survival times, any number greater
than this also works but nothing smaller!  Any explanation of this would
help!
From there I have been able to fit a few models but run into problems when I
try to inclue a variable called logrna (=log(rna)) in the fixed effects of
the lme model, it produces the error Error in if (t1 || t2) { : missing
value where TRUE/FALSE needed.  However if I include the term exp(logrna)
in the fixed effects, the joint model works fine, but this is not what I
want!  This is where I really need help, what is happening?

Charles


Charles Graham wrote
 I am trying to use the jointModel function in the JM package to fit a
 simple joint model to longitudinal and survival data.
 I have come accross a range of errors when trying different things and
 just can't seem to get around them all.
 
 The code I use is as follows:
 fitLME = lme(cd4~trt+time, random=~time|num, data=mnuts2); summary(fitLME)
 fitSURV = coxph(Surv(fail.time, SI.code)~trt, x=TRUE, data=cov);
 summary(fitSURV)
 fitJM = jointModel(fitLME, fitSURV, timeVar=time,
 method=piecewise-PH-GH); summary(fitJM)
 
 Both the lme and coxph functions work fine and both give the same sample
 size (though sometimes I get the unequal sample size error that I see
 others have experienced without a solution).  My current error says Can't
 fit a Cox model with 0 failures despite the coxph function working fine. 
 Previously I had an error message saying there were longitudinal
 measurements after the event (which I wouldn't have thought would be a
 problem!), I dealt with that by removing all measurements after the
 failure time.
 
 I want to know if there is somewhere I can find all the requirements (in
 terms of data structures, lme and coxph limitations etc.) for the
 jointModel function to work or someone who can help me with my errors.
 
 Charles





--
View this message in context: 
http://r.789695.n4.nabble.com/jointModel-error-messages-tp4644812p4645164.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R: machine for moderately large data

2012-10-05 Thread INCOMA GfK

Dear all,

I would like to ask your advice about a suitable computer for the following 
usage.
I (am starting to) work with moderately big data in R:
- cca 2 - 20 million rows * 100 - 1000 columns (market basket data)
- mainly clustering, classification trees, association analysis (e.g. libraries 
rpart, cba, proxy, party)

Can you recommend a sufficient computer for this volume?
I am routinely working in Windows but feel that Mac or some linux machine might 
be needed.

Please, respond directly to my email.
Many thanks!

Zdenek Skala
zdenek.sk...@gfk.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Minute Difference

2012-10-05 Thread Rantony

Hi,

Here i have a time along with date,
for eg:-  10/5/2012 5:05:00 AM

i need to do minus 10 minutes along current date
Like this :- 10/5/2012 4:55:00 AM

Thanks in Advance
Antony



--
View this message in context: 
http://r.789695.n4.nabble.com/Minute-Difference-tp4645157.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread fxen3k

I imported the whole dataset with read.csv2() and it works fine. (2 for
German is correct ;) )

I already checked the numbers and I also tried to calculate the mean of a
range of numbers where there is no NA given. (as mentioned in my last post
above).




--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645166.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Missing data (Na) and chi-square tests

2012-10-05 Thread Rerda

Dear everyone

I am a bit of a computer imbecile and are having problems with R.
I am using R in my research project to do chi-square tests on data imported
from excel .
However I have som missing data in one of my variables (columns) and I need
R to exclude these and make chi-square test on the data that I have.

I use a formula to make 2x2 tables which is:
data - matrix(c(sum(!Variable[Group==1]), sum(Variable[DAAC==1]),
sum(!Variable[Group==0]), sum(Variable[DAAC==0])),2,2)

How can I get R to ignore Na's in this formula?

Many Regards



--
View this message in context: 
http://r.789695.n4.nabble.com/Missing-data-Na-and-chi-square-tests-tp4645167.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread fxen3k

I'm sorry! 

Now I tried it again with just 10 numbers (just random numbers) and Excel
gives a different output than R.

Here are the numbers I used:

0,2006160108532920
0,1321167173880490
0,0563941428921262
0,0264198664609803
0,0200581303857603
-0,2971754213679500
-0,2353086361784190
0,0667195538296534
0,1755852636926560

And this is the command in R:

 nums - as.numeric(as.character(dataSet2$ac_bhar_60d_4d_after_ann[2:10]))
 m - mean(nums, na.rm = T)
 m

The output of R is: 
 print(m, digits= 12)
[1] 0.01667

The output in Excel is:
0,0161584031062386

The numbers are imported correctly. Or does R reduce the imported numbers to
any decimal place? (i don't think so ;-) )

Best Regards,
Felix




--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645165.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Likelihood: Multiple Maxima

2012-10-05 Thread felix1984

Hi everyone,

I estimate structural equation models and use maximum likelihood estimation.
However, the estimates differ depeding on the starting values I choose, so I
guess there are multiple local maxima. I'm new to R (and statistics...;),
does anybody maybe know how I solve this best?

Thanks a lt! ;)
Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/Likelihood-Multiple-Maxima-tp4645170.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] smoothScatter plot

2012-10-05 Thread John Kane


   In line



   John Kane
   Kingston ON Canada

   -Original Message-
   From: zhyjiang2...@hotmail.com
   Sent: Fri, 5 Oct 2012 05:41:29 +0800
   To: jrkrid...@inbox.com
   Subject: RE: [R] smoothScatter plot

   Hi John,

   Thanks for your email. Your way works good.

   However, I was wondering if you can help with a smoothed scatter plot that
   has shadows with different darker blue color representing higher density of
   points.

   Zhengyu

   Do   you   mean   something   like   what  is  being  discussed  here?
   http://andrewgelman.com/2012/08/graphs-showing-uncertainty-using-lighter-int
   ensities-for-the-lines-that-go-further-from-the-center-to-de-emphasize-the-e
   dges/
   If so I think there has been some discussion and accompanying ggplot2 code
   on google groups ggplot2 site.
   Otherwise can you explain a bit more clearly?
 _

   Date: Thu, 4 Oct 2012 05:46:46 -0800
   From: jrkrid...@inbox.com
   Subject: RE: [R] smoothScatter plot
   To: zhyjiang2...@hotmail.com
   CC: r-help@r-project.org
   Hi,
   Do you mean something like this?
   =
   scatter.smooth(x,y)scatter.smooth(x,y)
   =
   It looks like invoking that dcols - densCols(x,y) is callling in some
   package that is masking the basic::smoothScatter()  and applying some other
   version of smoothScatter, but I am not expert enough to be sure.
   Another way to get the same result as mine with smoothScatter is to use the
   ggplot2 package.  it looks a bit more complicated but it is very good and in
   some ways easier to see exactly what is happening.
   To   try   it   you   would   need  to  install  the  ggplot2  package
   (install.packages(ggplot2)  then with your original x and y data frames
   ===
   library(ggplot2)
   xy  -  cbind(x, y)
   names(xy)  -  c(xx, yy)
   p  -  ggplot(xy , aes(xx, yy )) + geom_point( ) +
geom_smooth( method=loess, se =FALSE)
   p
   
   Thanks for the data set.  However it really is easier to use dput()
   To use dput() simply issue the command dput(myfile) where myfile is the file
   you are working with.  It will give you something like this:
   ==
   1 dput(x)
   structure(c(0.4543462924, 0.2671718761, 0.1641577016, 1.1593356462,
   0.0421177346, 0.3127782861, 0.4515537795, 0.5332559665, 0.0913911528,
   0.1472054054, 0.1340672893, 1.2599304224, 0.3872026125, 0.0368560053,
   0.0371828779, 0.3999714282, 0.0175815783, 0.8871547761, 0.2706762487,
   0.7401904063, 0.0991320236, 0.2565567348, 0.5854167363, 0.7515717421,
   0.7220388222, 1.3528297744, 0.9339971349, 0.0128652431, 0.4102527051
   ), .Dim = c(29L, 1L), .Dimnames = list(NULL, V1))
   1 dput(y)
   structure(list(V1 = c(0.8669898448, 0.6698647266, 0.1641577016,
   0.4779091929, 0.2109900366, 0.2915241414, 0.2363116664, 0.3808731568,
   0.379908928, 0.2565868263, 0.1986675964, 0.7589866876, 0.6496236922,
   0.1327986663, 0.4196107999, 0.3436442638, 0.1910728051, 0.5625817464,
   0.1429791079, 0.6441837334, 0.1477153617, 0.369079266, 0.3839842979,
   0.39044223, 0.4186374286, 0.7611640016, 0.446291999, 0.2943343355,
   0.3019098386)), .Names = V1, class = data.frame, row.names = c(NA,
   -29L))
   1
   ===
   That is your x in dput() form.  You just copy it from the R terminal and
   paste it into your email message.  It is handy if you add the x  -  and y
   -  to the output.
   Your method works just fine but it's a bit more cumbersome with a lot of
   data.
   Also, please reply to the R-help list as well.  It is a source of much more
   expertise than me and it also can reply when a single person is unavailable.
   I hope this helps
   John Kane
   Kingston ON Canada

   -Original Message-
   From: zhyjiang2...@hotmail.com
   Sent: Thu, 4 Oct 2012 05:19:14 +0800
   To: jrkrid...@inbox.com
   Subject: RE: [R] smoothScatter plot

   Hi John,

   Thanks for your reply. But I cannot figure out how to use dput(). I included
   data and code below. Is that possible to make a plot similar to attached
   smoothing effect.

   Zhengyu
   ###

   x-read.table(text=0.4543462924
   0.2671718761
   0.1641577016
   1.1593356462
   0.0421177346
   0.3127782861
   0.4515537795
   0.5332559665
   0.0913911528
   0.1472054054
   0.1340672893
   1.2599304224
   0.3872026125
   0.0368560053
   0.0371828779
   0.3999714282
   0.0175815783
   0.8871547761
   0.2706762487
   0.7401904063
   0.0991320236
   0.2565567348
   0.5854167363
   0.7515717421
   0.7220388222
   1.3528297744
   0.9339971349
   0.0128652431
   0.4102527051,header=FALSE)
   y-read.table(text=0.8669898448
   0.6698647266
   0.1641577016
   0.4779091929
   0.2109900366

Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread Sarah Goslee

If the numbers were imported correctly you wouldn't need to do
as.numeric(as.character(yourdata)).

Please use dput() to provide your data, as in:
dput(dataSet2$ac_bhar_60d_4d_after_ann[2:10])

Otherwise it's impossible for us to diagnose your problem or reproduce
your error.

testdata - c(0.2006160108532920,
0.1321167173880490,
0.0563941428921262,
0.0264198664609803,
0.0200581303857603,
-0.2971754213679500,
-0.2353086361784190,
0.0667195538296534,
0.1755852636926560)


 mean(testdata)
[1] 0.0161584


Sarah


On Fri, Oct 5, 2012 at 9:14 AM, fxen3k f.seha...@gmail.com wrote:
 I'm sorry!

 Now I tried it again with just 10 numbers (just random numbers) and Excel
 gives a different output than R.

 Here are the numbers I used:

 0,2006160108532920
 0,1321167173880490
 0,0563941428921262
 0,0264198664609803
 0,0200581303857603
 -0,2971754213679500
 -0,2353086361784190
 0,0667195538296534
 0,1755852636926560

 And this is the command in R:

 nums - as.numeric(as.character(dataSet2$ac_bhar_60d_4d_after_ann[2:10]))
 m - mean(nums, na.rm = T)
 m

 The output of R is:
 print(m, digits= 12)
 [1] 0.01667

 The output in Excel is:
 0,0161584031062386

 The numbers are imported correctly. Or does R reduce the imported numbers to
 any decimal place? (i don't think so ;-) )

 Best Regards,
 Felix



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] t-test

2012-10-05 Thread Nico Met

Many many thanks for the input in this context.

Nico

On Thu, Oct 4, 2012 at 4:56 PM, Jose Iparraguirre 
jose.iparragui...@ageuk.org.uk wrote:

 Hi,

 You can also have a look at this paper (no subscription needed):

 Herberich, E.; Sikorski, J.; and Hothorn, T. (2010). A Robust Procedure
 for Comparing Multiple Means under Heteroscedasticity in Unbalanced
 Designs. PLoSONE, Vol. 5, Issue 3, e9788. doi:10.1371/journal.pone.0009788

 In this paper, Herberich et al have included a piece of R code to run
 their procedure. Their procedure is a modification of Tukey, and as it says
 in the title of the paper, it can be used regardless of whether the samples
 had different sizes or distributions.

 José


 José Iparraguirre
 Chief Economist
 Age UK

 T 020 303 31482
 E jose.iparragui...@ageuk.org.uk
 Twitter @jose.iparraguirre@ageuk


 Tavis House, 1- 6 Tavistock Square
 London, WC1H 9NB
 www.ageuk.org.uk | ageukblog.org.uk | @ageukcampaigns


 For a copy of our new Economic Monitor and the full Chief Economist's
 report, visit the Age UK Knowledge Hub
 http://www.ageuk.org.uk/professional-resources-home/knowledge-hub-evidence-statistics/


 For evidence and statistics on the older population, visit the Age UK
 Knowledge Hub
 http://www.ageuk.org.uk/professional-resources-home/knowledge-hub-evidence-statistics/


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of arun
 Sent: 04 October 2012 13:59
 To: John Kane
 Cc: R help
 Subject: Re: [R] t-test

 Hi John,
 You are right.  With more than two groups, the type 1 error rate should be
 a problem.
 A.K.



 - Original Message -
 From: John Kane jrkrid...@inbox.com
 To: arun smartpink...@yahoo.com
 Cc:
 Sent: Thursday, October 4, 2012 8:40 AM
 Subject: Re: [R] t-test

 My stats are lousy but isnt Nico doing some multiple t-tests when an anova
 with some post hoc comparisons complete with a Tukey or Bonferroni
 correction looks more suitable?

 Of course I have no idea of the topic area and maybe he has already done
 thi.

 John Kane
 Kingston ON Canada


  -Original Message-
  From: smartpink...@yahoo.com
  Sent: Thu, 4 Oct 2012 05:31:55 -0700 (PDT)
  To: nicome...@gmail.com
  Subject: Re: [R] t-test
 
  HI,
  Try this:
  sapply(split(dat,dat$Name),function(x) t.test(x[,2],dat[,2])$p.value)
  #CTK100 CTK103 CTK121
  #0.86330310 0.32706859 0.02023357
  A.K.
 
 
 
  - Original Message -
  From: Nico Met nicome...@gmail.com
  To: Rui Barradas ruipbarra...@sapo.pt
  Cc: r-help@r-project.org; r-help r-h...@stat.math.ethz.ch
  Sent: Thursday, October 4, 2012 6:37 AM
  Subject: Re: [R] t-test
 
  Dear Rui,
 
  Many thanks for help.
 
  mean for CTK and all  = comparison between mean of all groups ( which
  means second col) vs. each groups like CTK100, CTK121 etc.
 
  Regards
 
  Nico
 
  On Thu, Oct 4, 2012 at 12:28 PM, Rui Barradas ruipbarra...@sapo.pt
  wrote:
 
  Hello,
 
  I'm not quite sure I understand, but something like this?
 
  tapply(dat$Score, dat$Name, FUN = mean)
  sapply(unique(dat$Name), function(un){
   with(dat, t.test(Score[Name == un], Score[Name != un])$p.value)})
 
  My doubt is in what you mean by mean for CTK and all. The ?t.test
  gives
  a confidence interval for the difference in the means, so maybe you'll
  have
  to look there for what you want.
 
  Hope this helps,
 
  Rui Barradas
  Em 04-10-2012 10:34, Nico Met escreveu:
 
  Dear Group,
 
  I want to do a t-test calculation on a large data set.
 
  I am pasting some part of it
 
  structure(list(Name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 3L,
  3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
  2L), .Label = c(CTK100, CTK103, CTK121), class = factor),
Score = c(236.9726, 207.0055, 237.3464, 224.4774, 236.5034,
206.7382, 233.94, 240.31, 240.9, 235.15, 223.36, 248.67,
249.25, 201.4051, 244.1689, 182.2756, 229.001, 241.3211,
196.0453, 232.6055, 225.0783, 196.0453, 232.6055, 225.0783
)), .Names = c(Name, Score), class = data.frame, row.names
  =
  c(NA,
  24L))
 
 
  I want to compare groups with CTK100 and with all the groups and want
  to
  save the p-values and mean for each of that particular group (for
  example:
  mean for CTK and all)
  Similarly, for other groups like that CTK121 etc...
 
  Is there any way to automate this process?
 
  Thanks for your advice !
 
  Nico
 
   [[alternative HTML version deleted]]
 
  __**
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/**listinfo/r-help
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/**
  posting-guide.html http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing

Re: [R] problems with printing and plotting aareg

2012-10-05 Thread Terry Therneau


It's a bug in summary.aareg which no one found until now.

What's wrong:
If dfbeta=TRUE then there is a second estimate of variance calculated, labeled as 
test.var2. If maxtime is set, then both estimates of variance need to be recalculated by 
the summary routine.  An incorrect if-then-else flow led it to look for test.var2 when it 
wasn't relevant.  My test cases with maxtime also happened to have dfbeta=TRUE.


Short term solution: set dfbeta=TRUE.  A bit more computation time though.
Long term: I'll fix it, and a new version of survival will one day appear.  (With 200+ 
packages that depend on survival, new releases now require a lot of checking.  No 
overnight fixes).


Terry T

On 10/05/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hi all,

I've ventured into the world of nonparametric survival and I would like to use the 
maxtime option for printing and plotting my aareg fit.

However, my fit does not have test.var2 and this stops the print and plot 
when adding a maxtime.

My code is as follows:

Response-Surv(Time,Event)
Model-aareg(Response~Factor1*Factor2)
Model2-aareg(Response~Factor1+Factor2)  #Just did this to see if the 
interaction term had anything to do with it

Model, print(Model), summary(Model), and plot(Model) seem to work fine, but as 
soon as I try summary/print/plot(Model, maxtime=400) it tells me that test.var2 
is not found and when I look at  summary(Model), there is indeed a NULL under 
test.var2.

Anyone know why it doesn't include test.var2? Is this a compatibility problem? 
I'm using R version 2.13 (I know it's quite old, but updating is a pain when 
you don't have admin rights to your computer) and just updated the survival 
package (no warning messages).

Any input would be much appreciated.

Cheers,

Freya


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread William Dunlap

You need to show us the verbatim output of the following R command
  dput(dataSet2$ac_bhar_60d_4d_after_ann[2:10])
to make any further progress.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of fxen3k
 Sent: Friday, October 05, 2012 6:15 AM
 To: r-help@r-project.org
 Subject: Re: [R] Calculating the mean in one column with empty cells
 
 I'm sorry!
 
 Now I tried it again with just 10 numbers (just random numbers) and Excel
 gives a different output than R.
 
 Here are the numbers I used:
 
 0,2006160108532920
 0,1321167173880490
 0,0563941428921262
 0,0264198664609803
 0,0200581303857603
 -0,2971754213679500
 -0,2353086361784190
 0,0667195538296534
 0,1755852636926560
 
 And this is the command in R:
 
  nums - as.numeric(as.character(dataSet2$ac_bhar_60d_4d_after_ann[2:10]))
  m - mean(nums, na.rm = T)
  m
 
 The output of R is:
  print(m, digits= 12)
 [1] 0.01667
 
 The output in Excel is:
 0,0161584031062386
 
 The numbers are imported correctly. Or does R reduce the imported numbers to
 any decimal place? (i don't think so ;-) )
 
 Best Regards,
 Felix
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Calculating-the-mean-in-
 one-column-with-empty-cells-tp4645135p4645165.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Missing data (Na) and chi-square tests


Hello,

There are two ways,

1.
?sum  # see argument na.rm
sum(whatever, na.rm = TRUE)

2.
?table  # produces the 2x2 contingency table, if there are only 2 values

Also, you should provide us with a data example, especially since your 
code clearly doesn't work.

Use ?dput like this

dput( head(MyData, 20) )  # Then paste the output of this in a post.


Hope this helps,

Rui Barradas
Em 05-10-2012 14:26, Rerda escreveu:

Dear everyone

I am a bit of a computer imbecile and are having problems with R.
I am using R in my research project to do chi-square tests on data imported
from excel .
However I have som missing data in one of my variables (columns) and I need
R to exclude these and make chi-square test on the data that I have.

I use a formula to make 2x2 tables which is:
data - matrix(c(sum(!Variable[Group==1]), sum(Variable[DAAC==1]),
sum(!Variable[Group==0]), sum(Variable[DAAC==0])),2,2)

How can I get R to ignore Na's in this formula?

Many Regards



--
View this message in context: 
http://r.789695.n4.nabble.com/Missing-data-Na-and-chi-square-tests-tp4645167.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Minute Difference

2012-10-05 Thread David Winsemius


On Oct 5, 2012, at 5:56 AM, Rantony wrote:

 Hi,
 
 Here i have a time along with date,
 for eg:-  10/5/2012 5:05:00 AM
 
 i need to do minus 10 minutes along current date
 Like this :- 10/5/2012 4:55:00 AM
 

There are both 'cut' and 'seq' methods for vectors of class POSIXt. These 
support operations that specify particular units. 

?cut.POSIXt  # has link to the seq method help page

I am not posting code when questions fail to adhere to the reproducible code 
request.

-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LMMs with some variance terms forced constant

Erica E.M. Moodie, Dr. erica.moodie at mcgill.ca writes:

 I have been asked to help perform a meta-analysis with individual-
 and aggregate-level data. I found a nice article on this, and the
 idea is easy to understand: use a mixed effects models, but for the
 studies where there is only aggregate level data, force the variance
 to be that which was observed. Unfortunately, I am struggling to see
 how to implement this in R. I am familiar with standard mixed
 model formulations using nlme and lmer, but not able to see how to
 force this specific variance structure on my model so that some
 variance terms are not estimated.

  No solutions, but a few ideas:

 * I assume this is beyond the capacities/flexibility of the metafor package?

 * It's possible that the regress package
 http://cran.r-project.org/web/packages/regress/index.html
could be made to do this

 * There *may* be a way to do this with the pdStruct structures in nlme
(read the relevant bits of Pinheiro and Bates 2000, then stare at
your computer really hard for a while and see if anything comes to you),
although I suspect not

 * you can 'roll your own' mixed models in a flexible way with
WinBUGS/JAGS, maybe now Stan (if you want to be an early adopter!),
all of which have good R interfaces; it's conceivable that AS-REML
can do this too
 
 * follow-ups to r-sig-mixed-models please

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Missing data (Na) and chi-square tests

2012-10-05 Thread David Winsemius


On Oct 5, 2012, at 6:26 AM, Rerda wrote:

 Dear everyone
 
 I am a bit of a computer imbecile and are having problems with R.
 I am using R in my research project to do chi-square tests on data imported
 from excel .
 However I have som missing data in one of my variables (columns) and I need
 R to exclude these and make chi-square test on the data that I have.
 
 I use a formula to make 2x2 tables which is:
 data - matrix(c(sum(!Variable[Group==1]), sum(Variable[DAAC==1]),
 sum(!Variable[Group==0]), sum(Variable[DAAC==0])),2,2)

This might be as simple as:

with(Variable, table( Grp.1= Group == 1, DAAC.1= DAAC == 1) )

table will produce an object that inherits from matrix and the counts of 
logical values will get cross tabulated. The NA values will neither be equal to 
1 nor not-equal to 1, but will rather be ...NA. So they will not appear in the 
counts. This, of course, depends on my guess regarding you data structure being 
correct and my guess regarding you goals also beingcorrect. Neither of those 
critical bits of information have yet been offered.

-- 
David.
 
 How can I get R to ignore Na's in this formula?
 
 Many Regards
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Missing-data-Na-and-chi-square-tests-tp4645167.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in lmer: asMethod(object) : matrix is not symmetric [1, 2]

Ben Bolker bbolker at gmail.com writes:

 

  PS this error message is listed in http://glmm.wikidot.com/faq --
if you google 'lmer matrix is not symmetric' it's in the hits (although
pretty far from the top, alas).  I've added a little bit more discussion.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating the mean in one column with empty cells

Hi

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of fxen3k
 Sent: Friday, October 05, 2012 3:18 PM
 To: r-help@r-project.org
 Subject: Re: [R] Calculating the mean in one column with empty cells

 I imported the whole dataset with read.csv2() and it works fine. (2 for
 German is correct ;) )

 I already checked the numbers and I also tried to calculate the mean of
 a range of numbers where there is no NA given. (as mentioned in my last
 post above).

But as Sarah pointed out the result in R from your values (when correctly read) 
are the same as in Excel. Therefore the problem seems to be in ***your*** data

output from
str(dataSet2$ac_bhar_60d_4d_after_ann[2:10])

can be helpful in diagnosing what may be going on.

Regards
Petr

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-
 empty-cells-tp4645135p4645166.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: machine for moderately large data

Hi

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Skála, Zdeněk (INCOMA GfK)
 Sent: Friday, October 05, 2012 3:38 PM
 To: r-help@r-project.org
 Subject: [R] R: machine for moderately large data

 Dear all,

 I would like to ask your advice about a suitable computer for the
 following usage.
 I (am starting to) work with moderately big data in R:
 - cca 2 - 20 million rows * 100 - 1000 columns (market basket data)
 - mainly clustering, classification trees, association analysis (e.g.
 libraries rpart, cba, proxy, party)

If I compute correctly, such a big matrix (20e6*1000) needs about 160 GB just 
to be in memory. Are you prepared for this?

Maybe some suitable database interface shall be preferable.

Regards
Petr

 Can you recommend a sufficient computer for this volume?
 I am routinely working in Windows but feel that Mac or some linux
 machine might be needed.

 Please, respond directly to my email.
 Many thanks!

 Zdenek Skala
 zdenek.sk...@gfk.com

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Multiple graphs boxplot

2012-10-05 Thread David Gramaje



Dear all

I am trying to represent a dependent variable (treatment) against different 
independent variables (v1, v2, v3v20). I am using the following command:

boxplot(v1~treatment,data=y, main=xx,xlab=xx, ylab=xx)

However, it provides me only one graph for v1~treatment. For the other 
comparisons, I have to repeat the same command but changing the parameters. My 
intentions is to get different plots in just one sheet using only one command. 
Is it possible to join the same order for all the comparisons in only one 
command?

Thanks
David
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

2012-10-05 Thread Brian S Cade

Another alternative is to put the data in a linear model structure (1 
column for the response, another column for an indicator variable 
indicating group) and estimate all possible quantile regressions with rq() 
in quantreg package using a model with y ~ intercept + indicator (0,1) 
variable for group.   The estimated quantiles for the intercept will be 
the quantiles of the ecdf for one group and the estimated quantiles for 
the indicator grouping variable will be the differences in quantiles 
(ecdf) between the two groups.   There is useful built in graphing 
capability in quantreg with the plot.rqs() function.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  brian_c...@usgs.gov
tel:  970 226-9326



From:
user1234 mehenderso...@gmail.com
To:
r-help@r-project.org
Date:
10/05/2012 06:46 AM
Subject:
Re: [R] Kolmogorov-Smirnov test and the plot of max distance between two 
ecdf curves
Sent by:
r-help-boun...@r-project.org



Rui, 

Your response nearly answered a similar question of mine except that I 
also
have ecdfs of different lengths. 

Do you know how I can adjust  x - seq(min(loga, logb), max(loga, logb),
length.out=length(loga)) 
to account for this?  It must be in length.out() but I'm unsure how to
proceed.

Any advice is much appreciated.

-L


Rui Barradas wrote
 Hello,
 
 Try the following.
 (i've changed the color of the first ecdf.)
 
 
 loga - log10(a+1) # do this
 logb - log10(b+1) # only once
 
 f.a - ecdf(loga)
 f.b - ecdf(logb)
 # (2) max distance D
 
 x - seq(min(loga, logb), max(loga, logb), length.out=length(loga))
 x0 - x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
 y0 - f.a(x0)
 y1 - f.b(x0)
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=blue)
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE)
 ## alternatine, use standard R plot of ecdf
 #plot(f.a, col=blue)
 #lines(f.b, col=green)
 
 points(c(x0, x0), c(y0, y1), pch=16, col=red)
 segments(x0, y0, x0, y1, col=red, lty=dotted)
 ## alternative, down to x axis
 #segments(x0, 0, x0, y1, col=red, lty=dotted)
 
 
 Hope this helps,
 
 Rui Barradas
 maxbre wrote
 Hi all, 
 
 given this example 
 
 #start 
 
 a-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 
 
 
760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
 

 length(a)
 
 b-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 
  3220,490,20790,290,740,5350,940,3910,0,640,850,260) 
 length(b)
 
 out-ks.test(log10(a+1),log10(b+1)) 
 
 # max distance D 
 out$statistic 
 
 f.a-ecdf(log10(a+1)) 
 f.b-ecdf(log10(b+1)) 
 
 plot(f.a, verticals=TRUE, do.points=FALSE, col=red) 
 plot(f.b, verticals=TRUE, do.points=FALSE, col=green, add=TRUE) 
 
 #inverse of ecdf a
 x.a-get(x, environment(f.a))
 y.a-get(y, environment(f.a))
 
 # inverse of ecdf b
 x.b-get(x, environment(f.b))
 y.b-get(y, environment(f.b))
 
 
 #end
 
 I want to plot the max distance between the two ecdf curves as in the
 above given chart
 
 Is that possible and how? 
 
 
 Thanks for your help
 
 PS: this is an amended version of a previous thread (but no reply
 followed) that I?ve deleted from Nabble repository because I realised 
it
 was not enough clear (now I hope it?s a little better, sorry for that)





--
View this message in context: 
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Setup Alias Working Directory

2012-10-05 Thread Faisal Ghulam

I have 2 questions:

1-   I want to setup alias for dbconnect in User .Rprofile.

mydb = dbConnect(MySQL(), user='ghulam', password='password',
dbname='ghulam_test', host='localhost')


2- I want to setup two working directory one is for user personal dirctory.

I have setup .Rprofile in users Home directory with this command

setwd(/home/ghulam/R)

I also want to setup Shared Working Directory for all Users .

Can you please help me to figure it out.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] survival predictions

2012-10-05 Thread Mitja Kurki

Dear All,

I have built a survival cox-model, which includes a covariate * time 
interaction. (non-proportionality detected)
I am now wondering how could I most easily get survival predictions from my 
model.

My model was specified:
coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex + 
ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary + 
ICH + IVH_dummy + IVH_dummy:log(event_time_mod) 

And now I was hoping to get a prediction using survfit and providing new.data 
for the combination of variables 
I am doing the predictions:
  survfit(cox, new.data=new)

Now as I have event_time_mod in the right-hand side in my model I need to 
specify it in the new data frame passed on 
to survfit. This event_time would need to be set at individual times of the 
predictions. Is there an easy way to specify event_time_mod to be the correct 
time to  survfit?
Or are there any other options for achieving predictions from my model?

Of course I could create as many rows in the new data frame as there are 
distinct times in the predictions and setting to event_time_mod to correct 
values but it feels really cumbersome and
I thought that there must be a better way.


Best,

Mitja

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Test for Random Points on a Sphere

2012-10-05 Thread Lorenzo Isella


Dear All,
I implemented an algorithm for (uniform) random rotations.
In order to test it, I can apply it to a unit vector (0,0,1) in Cartesian  
coordinates.
The result is supposed to be a set of random, uniformly distributed,  
points on a sphere (not the point of the algorithm, but a way to test it).
This is what the points look like when I plot them, but other then  
eyeballing them, can anyone suggest a test to ensure that I am really  
generating uniform random points on a sphere?

Many thanks

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Broken Links on http://www.r-project.org

2012-10-05 Thread Uwe Ligges




On 04.10.2012 21:03, Hasan Diwan wrote:

The R Graphics Gallery has moved to http://gallery.r-enthusiasts.com/


Thanks, fixed.

Uwe Ligges




and
there's another R Graphics Manual at http://rgm2.lab.nig.ac.jp/RGM2 -- H

On 26 September 2012 04:56, Viechtbauer Wolfgang (STAT) 
wolfgang.viechtba...@maastrichtuniversity.nl wrote:


I was not sure who I should contact about this, so I am posting this here.

There are a few broken links on the R website.

1) http://www.r-project.org/search.html - link to the Nabble R Forum. I
belive the correct/new URL should be: http://r.789695.n4.nabble.com/

2) http://www.r-project.org/other-docs.html - link to Auswertung
Ã¶kologischer Daten. Not sure if there is a new URL.

3) http://www.r-project.org/other-projects.html - link to Jim Lindsey's
R page. I believe the correct/new URL should be:
http://www.commanster.eu/rcode.html

Best,
Wolfgang

--
Wolfgang Viechtbauer, Ph.D., Statistician
Department of Psychiatry and Psychology
School for Mental Health and Neuroscience
Faculty of Health, Medicine, and Life Sciences
Maastricht University, P.O. Box 616 (VIJV1)
6200 MD Maastricht, The Netherlands
+31 (43) 388-4170 | http://www.wvbauer.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple graphs boxplot


Hello,

I've once written a function that does more or less what you want, but 
it has no formula interface.


# Input:
#x - matrix or data.frame of numeric vectors to graph
#by - a factor or coercible to factor
multi.boxplot - function(x, by, col=0, ...){
x - as.data.frame(x)
uniq.by - unique(by)
by - factor(by)
len - length(uniq.by) - 1
n - ncol(x)
n1 - n + 1
col - rep(col, n)[seq_len(n)]
boxplot(x[[ 1 ]] ~ by, at = 0:len*n1 + 1,
xlim = c(0, (len + 1)*n1), ylim = range(unlist(x)), xaxt = n, 
col=col[1], ...)

for(i in seq_len(n)[-1])
boxplot(x[[i]] ~ by, at = 0:len*n1 + i, xaxt = n, add = TRUE, 
col=col[i], ...)

axis(1, at = 0:len*n1 + n1/2, labels = uniq.by, tick = TRUE)
}


a - matrix(data=runif(300,max=2), nrow=100, ncol=3)
fac - sample(letters[1:4], 100, TRUE)

multi.boxplot(a, fac)


Hope this helps,

Rui Barradas
Em 05-10-2012 17:01, David Gramaje escreveu:


Dear all

I am trying to represent a dependent variable (treatment) against different 
independent variables (v1, v2, v3v20). I am using the following command:

boxplot(v1~treatment,data=y, main=xx,xlab=xx, ylab=xx)

However, it provides me only one graph for v1~treatment. For the other 
comparisons, I have to repeat the same command but changing the parameters. My 
intentions is to get different plots in just one sheet using only one command. 
Is it possible to join the same order for all the comparisons in only one 
command?

Thanks
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] smoothScatter plot

2012-10-05 Thread JiangZhengyu

Hi John, Thanks for your link. Those plots look pretty but way too complicated 
in terms of making R code.   Maybe my decription is not clear.  But could you 
take a look at the attached png? I saw several publications showing smoothed 
plots like this but not sure how to make one... Thanks,Best,Zhengyu  Date: Fri, 
5 Oct 2012 06:36:38 -0800
From: jrkrid...@inbox.com
Subject: RE: [R] smoothScatter plot
To: zhyjiang2...@hotmail.com
CC: r-help@r-project.org

In line
 John Kane
Kingston ON Canada

-Original Message-
From: zhyjiang2...@hotmail.com
Sent: Fri, 5 Oct 2012 05:41:29 +0800
To: jrkrid...@inbox.com
Subject: RE: [R] smoothScatter plot

Hi John,

Thanks for your email. Your way works good. 

However, I was wondering if you can help with a smoothed scatter plot that has 
shadows with different darker blue color representing higher density of points.

Zhengyu 

Do you mean something like what is being discussed here? 
http://andrewgelman.com/2012/08/graphs-showing-uncertainty-using-lighter-intensities-for-the-lines-that-go-further-from-the-center-to-de-emphasize-the-edges/

If so I think there has been some discussion and accompanying ggplot2 code on 
google groups ggplot2 site.  

Otherwise can you explain a bit more clearly?
Date: Thu, 4 Oct 2012 05:46:46 -0800
From: jrkrid...@inbox.com
Subject: RE: [R] smoothScatter plot
To: zhyjiang2...@hotmail.com
CC: r-help@r-project.org

Hi,

Do you mean something like this?  
=
scatter.smooth(x,y)scatter.smooth(x,y)
=

It looks like invoking that dcols - densCols(x,y) is callling in some package 
that is masking the basic::smoothScatter()  and applying some other version of 
smoothScatter, but I am not expert enough to be sure.

Another way to get the same result as mine with smoothScatter is to use the 
ggplot2 package.  it looks a bit more complicated but it is very good and in 
some ways easier to see exactly what is happening.

To try it you would need to install the ggplot2 package 
(install.packages(ggplot2)  then with your original x and y data frames
===
library(ggplot2)
xy  -  cbind(x, y)
names(xy)  -  c(xx, yy)

p  -  ggplot(xy , aes(xx, yy )) + geom_point( ) + 
 geom_smooth( method=loess, se =FALSE)
p 

Thanks for the data set.  However it really is easier to use dput()

To use dput() simply issue the command dput(myfile) where myfile is the file 
you are working with.  It will give you something like this:
==
1 dput(x)
structure(c(0.4543462924, 0.2671718761, 0.1641577016, 1.1593356462, 
0.0421177346, 0.3127782861, 0.4515537795, 0.5332559665, 0.0913911528, 
0.1472054054, 0.1340672893, 1.2599304224, 0.3872026125, 0.0368560053, 
0.0371828779, 0.3999714282, 0.0175815783, 0.8871547761, 0.2706762487, 
0.7401904063, 0.0991320236, 0.2565567348, 0.5854167363, 0.7515717421, 
0.7220388222, 1.3528297744, 0.9339971349, 0.0128652431, 0.4102527051
), .Dim = c(29L, 1L), .Dimnames = list(NULL, V1))

1 dput(y)
structure(list(V1 = c(0.8669898448, 0.6698647266, 0.1641577016, 
0.4779091929, 0.2109900366, 0.2915241414, 0.2363116664, 0.3808731568, 
0.379908928, 0.2565868263, 0.1986675964, 0.7589866876, 0.6496236922, 
0.1327986663, 0.4196107999, 0.3436442638, 0.1910728051, 0.5625817464, 
0.1429791079, 0.6441837334, 0.1477153617, 0.369079266, 0.3839842979, 
0.39044223, 0.4186374286, 0.7611640016, 0.446291999, 0.2943343355, 
0.3019098386)), .Names = V1, class = data.frame, row.names = c(NA, 
-29L))
1 

===

That is your x in dput() form.  You just copy it from the R terminal and paste 
it into your email message.  It is handy if you add the x  -  and y  -  to 
the output.  

Your method works just fine but it's a bit more cumbersome with a lot of data.

Also, please reply to the R-help list as well.  It is a source of much more 
expertise than me and it also can reply when a single person is unavailable.

I hope this helps

John Kane
Kingston ON Canada

-Original Message-
From: zhyjiang2...@hotmail.com
Sent: Thu, 4 Oct 2012 05:19:14 +0800
To: jrkrid...@inbox.com
Subject: RE: [R] smoothScatter plot

Hi John,

Thanks for your reply. But I cannot figure out how to use dput(). I included 
data and code below. Is that possible to make a plot similar to attached 
smoothing effect.

Zhengyu
###

x-read.table(text=0.4543462924
0.2671718761
0.1641577016
1.1593356462
0.0421177346
0.3127782861
0.4515537795
0.5332559665
0.0913911528
0.1472054054
0.1340672893
1.2599304224
0.3872026125
0.0368560053
0.0371828779
0.3999714282
0.0175815783
0.8871547761
0.2706762487
0.7401904063
0.0991320236
0.2565567348
0.5854167363
0.7515717421
0.7220388222
1.3528297744
0.9339971349
0.0128652431
0.4102527051,header=FALSE)

Re: [R] R: machine for moderately large data

On Fri, Oct 5, 2012 at 12:09 PM, PIKAL Petr petr.pi...@precheza.cz wrote:
 Hi

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Skála, Zdeněk (INCOMA GfK)
 Sent: Friday, October 05, 2012 3:38 PM
 To: r-help@r-project.org
 Subject: [R] R: machine for moderately large data

 Dear all,

 I would like to ask your advice about a suitable computer for the
 following usage.
 I (am starting to) work with moderately big data in R:
 - cca 2 - 20 million rows * 100 - 1000 columns (market basket data)
 - mainly clustering, classification trees, association analysis (e.g.
 libraries rpart, cba, proxy, party)

 If I compute correctly, such a big matrix (20e6*1000) needs about 160 GB just 
 to be in memory. Are you prepared for this?

This is not as outrageous as one might think -- you can get a mac pro
with 32 gigs of memory for around $3,500

Best,
Ista

 Maybe some suitable database interface shall be preferable.

 Regards
 Petr

 Can you recommend a sufficient computer for this volume?
 I am routinely working in Windows but feel that Mac or some linux
 machine might be needed.

 Please, respond directly to my email.
 Many thanks!

 Zdenek Skala
 zdenek.sk...@gfk.com

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Format of numbers in plotmath expressions.

2012-10-05 Thread Uwe Ligges




On 05.10.2012 09:30, Rolf Turner wrote:


I want to do something like:

TH - sprintf(%1.1f,c(0.3,0.5,0.7,0.9,1))
plot(1:10)
legend(bottomright,pch=1:5,legend=parse(text=paste(theta ==,TH)))

Notice that the final 1 comes out in the legend as just plain 1 and NOT
as 1.0 although TH is

[1] 0.3 0.5 0.7 0.9 1.0

I can get plotmath to preserve 1.0 as 1.0 and NOT convert it to 1
if I use substitute, as in

text(2,5,labels=substitute(list(theta == a),list(a=TH[5])))

but substitute doesn't work appropriately with vectors.

Can anyone tell me how to get a 1.0 rather than 1 in the legend?

Ta.

 cheers,

 Rolf Turner

P.S. Just figured out a way using sapply():

leg - sapply(TH,function(x){substitute(list(theta == a),list(a=x))})
plot(1:10)
legend(bottomright,pch=1:5,legend=parse(text=leg))

Note that the use of parse (pace Thomas Lumley! :-) ) is required ---
legend=leg does NOT work.

Getting here required an enormous amount of trial and error.  And it seems
pretty kludgy.

Is there a sexier way?



Not sure what is sexier here. I'd stay with

leg - lapply(TH, function(x) bquote(list(theta == .(x
plot(1:10)
legend(bottomright, pch=1:5, legend=as.expression(leg))


Best,
Uwe Ligges





 R. T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: machine for moderately large data

2012-10-05 Thread Steve Lianoglou

Hi,

On Fri, Oct 5, 2012 at 1:41 PM, Ista Zahn istaz...@gmail.com wrote:
 On Fri, Oct 5, 2012 at 12:09 PM, PIKAL Petr petr.pi...@precheza.cz wrote:
[snip]
 If I compute correctly, such a big matrix (20e6*1000) needs about 160 GB 
 just to be in memory. Are you prepared for this?

 This is not as outrageous as one might think -- you can get a mac pro
 with 32 gigs of memory for around $3,500


And even so, I suspect the matrices that will be worked with are
sparse, so you can get more savings there (although I'm not sure which
of the packages the OP had listed work with sparse input.

That having been said, if you don't want to sample from your data,
sometimes R isn't the best solution.

There are projects being developed to specifically deal with such big data.

For one, you might consider looking at the graphlab/graphchi stuff:

http://graphlab.org

(Graphchi is meant to process big data on a modest machine). If you
go to the Toolkits menu, you'll see they have an implementation of
kmeans++ clustering that might be suitable for your clustering
analysis (perhaps some matrix factorizations are useful here, too --
perhaps your market basket data can be viewed as some type of
collaborative filtering problem, in which case their collaborative
filtering toolkit is right up your alley ;-)

The OP also mentioned classification trees. Perhaps rf-ace might be useful:
http://code.google.com/p/rf-ace/

From their website:


RF-ACE implements both Random Forest (RF) and Gradient Boosting Tree
(GBT) algorithms, and is strongly related to ACE, originally outlined
in http://jmlr.csail.mit.edu/papers/volume10/tuv09a/tuv09a.pdf


If you scroll down to the case study section of their main page, you
can there is some talk about how they used this in a distributed
manner ... perhaps it is applicable in your case as well (in which
case you might be able to rig up AWS to help you).

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Test for Random Points on a Sphere

2012-10-05 Thread R. Michael Weylandt

On Fri, Oct 5, 2012 at 5:39 PM, Lorenzo Isella lorenzo.ise...@gmail.com wrote:
 Dear All,
 I implemented an algorithm for (uniform) random rotations.
 In order to test it, I can apply it to a unit vector (0,0,1) in Cartesian
 coordinates.
 The result is supposed to be a set of random, uniformly distributed, points
 on a sphere (not the point of the algorithm, but a way to test it).
 This is what the points look like when I plot them, but other then
 eyeballing them, can anyone suggest a test to ensure that I am really
 generating uniform random points on a sphere?
 Many thanks


Gut says to divide the surface into n bits of equal area and see if
the points appear uniformly in those using something chi-squared-ish,
but I'm not aware of a canonical way to do so.

Cheers,
Michael

 Lorenzo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Test for Random Points on a Sphere

2012-10-05 Thread Nordlund, Dan (DSHS/RDA)

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of R. Michael Weylandt
 Sent: Friday, October 05, 2012 11:17 AM
 To: Lorenzo Isella
 Cc: r-help@r-project.org
 Subject: Re: [R] Test for Random Points on a Sphere

 On Fri, Oct 5, 2012 at 5:39 PM, Lorenzo Isella
 lorenzo.ise...@gmail.com wrote:
  Dear All,
  I implemented an algorithm for (uniform) random rotations.
  In order to test it, I can apply it to a unit vector (0,0,1) in
 Cartesian
  coordinates.
  The result is supposed to be a set of random, uniformly distributed,
 points
  on a sphere (not the point of the algorithm, but a way to test it).
  This is what the points look like when I plot them, but other then
  eyeballing them, can anyone suggest a test to ensure that I am really
  generating uniform random points on a sphere?
  Many thanks

 Gut says to divide the surface into n bits of equal area and see if
 the points appear uniformly in those using something chi-squared-ish,
 but I'm not aware of a canonical way to do so.

 Cheers,
 Michael

  Lorenzo

I would be more inclined to use a method which is known to produce a points 
uniformly distributed on the surface of a sphere and not worry about testing 
your results.  You might find the discussion at the following link useful.

http://mathworld.wolfram.com/SpherePointPicking.html

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Minute Difference

Hi,
Try this:
date1-10/5/2012 5:05:00 AM 

 format(as.POSIXct(strptime(date1,format=%m/%d/%Y 
%H:%M:%S))-600,format=%m/%d/%Y %r)
#[1] 10/05/2012 04:55:00 AM

A.K.



- Original Message -
From: Rantony antony.akk...@ge.com
To: r-help@r-project.org
Cc: 
Sent: Friday, October 5, 2012 8:56 AM
Subject: [R] Minute Difference

Hi,

Here i have a time along with date,
for eg:-  10/5/2012 5:05:00 AM

i need to do minus 10 minutes along current date
Like this :- 10/5/2012 4:55:00 AM

Thanks in Advance
Antony



--
View this message in context: 
http://r.789695.n4.nabble.com/Minute-Difference-tp4645157.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Minute Difference

2012-10-05 Thread R. Michael Weylandt

Mr Rantony could also probably benefit from taking a look at the
lubridate package.

Cheers,
Michael

On Fri, Oct 5, 2012 at 8:04 PM, arun smartpink...@yahoo.com wrote:
 Hi,
 Try this:
 date1-10/5/2012 5:05:00 AM

  format(as.POSIXct(strptime(date1,format=%m/%d/%Y 
 %H:%M:%S))-600,format=%m/%d/%Y %r)
 #[1] 10/05/2012 04:55:00 AM

 A.K.



 - Original Message -
 From: Rantony antony.akk...@ge.com
 To: r-help@r-project.org
 Cc:
 Sent: Friday, October 5, 2012 8:56 AM
 Subject: [R] Minute Difference

 Hi,

 Here i have a time along with date,
 for eg:-  10/5/2012 5:05:00 AM

 i need to do minus 10 minutes along current date
 Like this :- 10/5/2012 4:55:00 AM

 Thanks in Advance
 Antony

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert multi dimensional array to list

2012-10-05 Thread Greg Snow

See fortune(toad) for a bit more on this concept.

On Thu, Oct 4, 2012 at 3:19 PM, David Winsemius dwinsem...@comcast.net wrote:

On Oct 4, 2012, at 8:57 AM, anto.r wrote:

Hi Michael

thanks! That was the option if I kept it an array. The list format with $
sign since it leaves me feeling that the names are there and can be easily
accessed. Why would you rather not use the $ sign?

It would be better in programing to learn to use the [[ operator for which
'$' is just a particular application that is less flexible because it won't
evaluate its argument.

I use R-Studio and there names can be selected from a drop-down list, I have
found it easier but that could be my lack of proper training in R.

You should be able to do that with column names in dataframes using
object[[col]]

Cheers
Anto

--
View this message in context:
http://r.789695.n4.nabble.com/convert-multi-dimensional-array-to-list-tp4645011p4645036.html
Sent from the R help mailing list archive at Nabble.com.

Learn to post context, please. The all too typical habit of Nabble-users in
failing to include context is a constant source of annoyance to regular
R-help readers.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot with some 0 frequencies

2012-10-05 Thread Greg Snow

Others have shown barchart commands that include 0's.  My guess (which
could be wrong, feel free to ignore this e-mail if so) is that your
problem is due to the way you are creating/summarizing the data before
the plotting command is used.  R needs some way to know about what
bars you want, even if they are 0.  This is information that should be
a property of the data rather than a property of the graph (if made a
property of the data then the graph will take care of itself).

Compare the 2 plots created by the following code:

par(mfrow=c(2,1))
set.seed(1)
tmp - sample( LETTERS[1:7], 10, TRUE )
barplot( table(tmp) )
tmp2 - factor(tmp, levels=LETTERS[1:7])
barplot( table(tmp2) )

Does that show what your problem is? and what you would like the
results to look like?

The key is to create the data object (the factor 'tmp2' in this case)
which includes the information about the levels, even if they are not
present in this dataset.

On Thu, Oct 4, 2012 at 5:49 PM, Guillaume2883
guillaume.bal@gmail.com wrote:
 Hi all,

 I am back with a new question !
 I recorded the occurence of 4 differents event on 20 places for a given time
 period.
 Now, I want to do some barplot of the frequency of theses events for each
 place, so it should be easy. My problem is that I want to see the
 frequencies of the 4 events on my barplots even if the frequency of some of
 them is 0.
 How could I do that ?

 Thanking you in advance for your help !

 Guillaume



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/barplot-with-some-0-frequencies-tp4645102.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] BCA Package

2012-10-05 Thread Greg Snow

Kokila,

Note that the quality of replies on this mailing list tend to
correlate with the quality of the questions.  Follow the advice found
in the last 2 lines of this (and all) message and ask a better
question and you are more likely to get a better reply.

On Fri, Oct 5, 2012 at 5:53 AM, kokila kokila.krish...@quantilez.com wrote:
 Hi...In R rankscore()  function How to calculate estimation  validation 
 holdout?Iam waiting for replyPlease Reply.



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/BCA-Package-tp4645146.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] loop for column substraction of a matrix

2012-10-05 Thread eliza botto


Dear useRs,


I have a matrix with 38 columns and 365 rows. what i want to do is the 
following.

1. subtracting from each column, first itself and then all the remaining 
columns. More precisely, from column number 1, i will first subtract 
itself(column 1) and then the remaining 37 columns. Afterwards i will take 
column number 2 and do the same. In this way i want to proceed till 38th 
column. 
2. i want to print the resulting matrix on A4 paper.

Bundle of thanks in advance.

best regards
eliza


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bayesian inference distribution parameters

2012-10-05 Thread Guillaume2883

Maybe this link could help you 
http://sirs.agrocampus-ouest.fr/bayes_V2/index_menu.html
You will be able to download some code !



--
View this message in context: 
http://r.789695.n4.nabble.com/Bayesian-inference-distribution-parameters-tp4645155p4645207.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using variables from different environments in one function

2012-10-05 Thread Rik Verdonck

Dear R-community,


I have been experiencing this issue with functions within a function and
I can kind of feel where the problem is (environments, closures etc),
but I don't manage to solve it.

I will sketch it with an example:

#We have two simple inner functions:

## INNER FUNCTION1 #
innerfunction1-function()
{
ab-a+b
cd-c+d
innerfunctionlist-list(ab=ab,cd=cd)
}


## INNER FUNCTION2 #
innerfunction2-function()
{
print(AB+CD)
}


#And we have a main script with a number of predefined variables.
#This shapes an environment in which variables that will be used
#by the inner function are defined.
#   MAIN SCRIPT 
a=1
b=2
c=3
d=4


#source(filepath/to/innerfunction1)
outcome1-innerfunction1()

AB  -outcome1$ab
CD  -outcome1$cd

#source(filepath/to/innerfunction2)
innerfunction2()


#



#So far so good. No problem if you run this.
#The problem emerges if you want to make the main script a function.
#So now I wrap it into a function:


main-function()
{
a=1
b=2
c=3
d=4

#source(filepath/to/innerfunction1)
outcome1-innerfunction1()

AB -outcome1$ab
CD -outcome1$cd

#source(filepath/to/innerfunction2)
innerfunction2()
}



when I now run main() in an environment where all these variables are not 
defined, I will get this error
message:

Error in innerfunction1() : object 'a' not found

I can solve this by defining the variables a, b, c and d. However, I
will then get next error message:

Error in print(AB + CD) : object 'AB' not found


I think I see what happens. The right variables are not present
in the right environment. Probably I did something silly, but I really
don't see how to solve this. Can someone please help me?

Many thanks!
Rik Verdonck
University of Leuven

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with making figures

2012-10-05 Thread megalops

Bert,

Can you help me understand your suggestion?  I don't understand how I can
include all 30 sites under the label called site in the xypot code example
you provided.  





--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-making-figures-tp4645074p4645216.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot with some 0 frequencies

2012-10-05 Thread Guillaume2883

Thank for all your answer !
Here is another way I discoverd this morning :

barplot(table(factor(variable, levels = 1:4)),names=c(1,2,3,4))



--
View this message in context: 
http://r.789695.n4.nabble.com/barplot-with-some-0-frequencies-tp4645102p4645208.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] LaTeX consistent publication graphics from R and Comparison of GLE and R

2012-10-05 Thread clangkamp

Hi Everyone

I am at the moment preparing my thesis and am looking at producing a few
Organigrams / Flow charts (unrelated to the calculations in R) as well as a
range of charts (barcharts, histograms, ...) based on calculations in R.

For the Organigrams I am looking at an Opensource package called GLE at
sourceforge, which produces the text part in Latex figures which is very
neat and also in the same style of the thesis, which I wrote in LaTeX. It
also offers a range of graphical features, and I am quite tempted.

It also produces barcharts and histograms with the options of legends etc. I
have done most of my graphs so far with R, but with Organigrams and flow
charts I am at a loss (A pointer here would also be very welcome). For some
charts I have used MS Visio, but it would be convenient to use just one
program for graphing throughout the thesis (i.e. same colour coding etc.).

Does anybody have any experience with GLE, ideally working with it with CSV
tables generated within R ? Or does there exist another way to generate
'visually LaTeX consistent' graphics within R ?

Any takers ?

-
Christian Langkamp
christian.langkamp-at-gmxpro.de

--
View this message in context:
http://r.789695.n4.nabble.com/LaTeX-consistent-publication-graphics-from-R-and-Comparison-of-GLE-and-R-tp4645218.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop for column substraction of a matrix


Hello,

1. Let me refrase it a bit. For each column, sum all others and take the 
symmetric.



mat - matrix(1:30, ncol = 3)
sapply(seq_len(ncol(mat)), function(i) -rowSums(mat[, -i]))


2. write.table (maybe using sep = \t ?) and send the file to printer.

Hope this helps,

Rui Barradas

Em 05-10-2012 21:05, eliza botto escreveu:

Dear useRs,


I have a matrix with 38 columns and 365 rows. what i want to do is the 
following.

1. subtracting from each column, first itself and then all the remaining 
columns. More precisely, from column number 1, i will first subtract 
itself(column 1) and then the remaining 37 columns. Afterwards i will take 
column number 2 and do the same. In this way i want to proceed till 38th column.
2. i want to print the resulting matrix on A4 paper.

Bundle of thanks in advance.

best regards
eliza



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating all possible ratios

2012-10-05 Thread genome1976


Thanks Rui.I actually exactly did that.


Date: Fri, 5 Oct 2012 05:49:20 -0700
From: ml-node+s789695n4645154...@n4.nabble.com
To: genome1...@hotmail.com
Subject: Re: Calculating all possible ratios



Hello,


Comment out the second apply and all following instructions using 'r2'. 

In the end return 'r1', not cbind.


Hope this helps,


Rui Barradas

Em 04-10-2012 23:38, genome1976 escreveu:

 Hi Rui,

   

 A while ago you helped me with calculaing all possible ratios from a dataset.

   

 This is the code I am using as suggested by you.

   

 data - read.table(new_data.txt, header=T, row.names=1, sep=\t)

 pairwise.ratios - function(x, prefix=probeset, char=:){

  n - ncol(x)

  cn - colnames(x)

  if(length(cn) == 0){

  cn - gsub( , 0, formatC(seq.int(n), width=nchar(n)))

  cn - paste(prefix, cn, sep=)

  }

  cmb - combn(n, 2)

  r1 - apply(cmb, 2, function(j) x[, j[1]]/x[, j[2]])

  r2 - apply(cmb, 2, function(j) x[, j[2]]/x[, j[1]])

  colnames(r1) - apply(cmb, 2, function(j) paste(cn[j], 
 collapse=char))

  colnames(r2) - apply(cmb, 2, function(j) paste(cn[rev(j)], 
 collapse=char))

  cbind(r1, r2)[, order(c(colnames(r1), colnames(r2)))]

 }

 results - pairwise.ratios(data.t)

 write.table(t(results), ratios_results.txt, sep=\t)

   

 It works perfectly fine only that it gives both pairs of ratios a:b and b:a 
 for any two variables a and b.

 Can you suggest me a way so that I get only one ratio and not both 
 (Combination with caring for the order and not Permutation??)

   

 Thanks for any help.

   

 Best Regards,

 Som.



   







 Date: Sat, 12 May 2012 15:20:52 -0700

 From: [hidden email]

 To: [hidden email]

 Subject: RE: Calculating all possible ratios



 Hello,



 Nothing wrong with me, maybe your R session has some conflicting objects.

 Running the function in the previous post on the first 4 rows and first 6 
 columns of your dataset the result was (copypaste to your session)



 result - structure(c(8.74714923153198, 1.83094400392095, 9.92065138471113,

 1.77145415014708, 1.01515180575001, 0.167175438316099, 0.222321656865252,

 0.155576771874649, 3.09417748158541, 0.469647988505747, 1.29398633565582,

 0.524043736521509, 3.75969597954255, 0.422694576901317, 9.75471698113208,

 0.290397651827521, 4.9035575319622, 1.00105273231888, 1.01093964697178,

 0.26895145631068, 0.114322960947685, 0.546166347992352, 0.100799832714726,

 0.564507977763338, 0.11605516024473, 0.0913055986191245, 0.0224099858208782,

 0.0878243288779063, 0.353735531392494, 0.256505926724138, 0.130433606169248,

 0.295826869963301, 0.42981957664441, 0.230861553382365, 0.983273839877614,

 0.163931791180376, 0.56058921623124, 0.546741314958369, 0.10190254729944,

 0.151825242718447, 0.9850743448771, 5.98173996175908, 4.49798734905118,

 6.4276947512815, 8.61659229879359, 10.9522309159971, 44.622964422,

 11.3863665430362, 3.04799485560622, 2.8093121408046, 5.82033416762497,

 3.36839317468124, 3.70358005398494, 2.52844904226946, 43.8765935747068,

 1.86658746243623, 4.83036872336483, 5.98803713273998, 4.5471937427,

 1.72873786407767, 0.323187666496628, 2.12925430210325, 0.772805687699305,

 1.90823767237023, 2.82697074863659, 3.89854539725884, 7.66673581578674,

 3.38035554418724, 0.328084543240185, 0.35595902124055, 0.1718114409242,

 0.296877457036954, 1.21508737036511, 0.900024246342843, 7.53850076491586,

 0.554147739185128, 1.58476931628683, 2.13149583692219, 0.781259909100518,

 0.513223300970874, 0.265978952936953, 2.36577437858509, 0.102514506769826,

 3.44355401535389, 2.32655759378615, 4.33160041310018, 1.01701068353905,

 6.10009805175427, 0.270009014365446, 0.395499368696959, 0.0227911949977918,

 0.535737017484743, 0.822986086753186, 1.11108117816092, 0.132652370966651,

 1.8045729131197, 1.30424309801742, 2.36826490573261, 0.103635979283374,

 0.926148867313916, 0.203933571388086, 0.998948374760994, 0.989178733859585,

 3.71814309436142, 1.78383738225087, 1.82901853699522, 9.81329737579089,

 6.58652001534723, 0.207023533247665, 0.166999632405824, 0.219915855047535,

 0.578456699988768, 0.631006664328306, 0.469154094827586, 1.27998376513563,

 1.9484696000908, 0.76672822844154, 0.422250060615857, 9.64915859255482,

 1.07974002376127), .Dim = c(4L, 30L), .Dimnames = list(c(S1,

 S2, S3, S4), c(P1:P2, P1:P3, P1:P4, P1:P5, P1:P6,

 P2:P1, P2:P3, P2:P4, P2:P5, P2:P6, P3:P1, P3:P2,

 P3:P4, P3:P5, P3:P6, P4:P1, P4:P2, P4:P3, P4:P5,

 P4:P6, P5:P1, P5:P2, P5:P3, P5:P4, P5:P6, P6:P1,

 P6:P2, P6:P3, P6:P4, P6:P5)))



 Rui Barradas









 genome1976 wrote

 Hi Rui,

 Thanks once again. I really appreciate it.

 I tried using the code with the following dataset:













Sample

P1

P2

P3

P4

P5

P6

P7

P8

P9

P10





S1

5292.9

605.1

5213.9

1710.6

1407.8

1079.4

Re: [R] loop for column substraction of a matrix

HI,

Try this:
set.seed(1)
 mat1-matrix(sample(1:500,380,replace=TRUE),ncol=38)
list1-list()
for(i in 1:ncol(mat1)){
 list1[[i]]-t(apply(mat1,1,function(x) x[i]-x))
 list1}


list1
A.K.


- Original Message -
From: eliza botto eliza_bo...@hotmail.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Friday, October 5, 2012 4:05 PM
Subject: [R] loop for column substraction of a matrix


Dear useRs,


I have a matrix with 38 columns and 365 rows. what i want to do is the 
following.

1. subtracting from each column, first itself and then all the remaining 
columns. More precisely, from column number 1, i will first subtract 
itself(column 1) and then the remaining 37 columns. Afterwards i will take 
column number 2 and do the same. In this way i want to proceed till 38th 
column. 
2. i want to print the resulting matrix on A4 paper.

Bundle of thanks in advance.

best regards
eliza


                          
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LaTeX consistent publication graphics from R and Comparison of GLE and R

2012-10-05 Thread Marc Schwartz

On Oct 5, 2012, at 3:32 PM, clangkamp christian.langk...@gmxpro.de wrote:

 Hi Everyone
 
 I am at the moment preparing my thesis and am looking at producing a few
 Organigrams / Flow charts (unrelated to the calculations in R) as well as a
 range of charts (barcharts, histograms, ...) based on calculations in R. 
 
 For the Organigrams I am looking at an Opensource package called GLE at
 sourceforge, which produces the text part in Latex figures which is very
 neat and also in the same style of the thesis, which I wrote in LaTeX. It
 also offers a range of graphical features, and I am quite tempted.
 
 It also produces barcharts and histograms with the options of legends etc. I
 have done most of my graphs so far with R, but with Organigrams and flow
 charts I am at a loss (A pointer here would also be very welcome). For some
 charts I have used MS Visio, but it would be convenient to use just one
 program for graphing throughout the thesis (i.e. same colour coding etc.).
 
 Does anybody have any experience with GLE, ideally working with it with CSV
 tables generated within R ? Or does there exist another way to generate
 'visually LaTeX consistent' graphics within R ?
 
 Any takers ?



If you are comfortable in LaTeX, I would suggest that you look at PSTricks:

  http://tug.org/PSTricks/main.cgi

I use that for creating subject disposition flow charts for clinical trials 
with Sweave. I can then use \Sexpr{}'s to fill in various annotations in the 
boxes, etc. so that all content is programmatically created in a reproducible 
fashion.

There are some examples of flow charts and tree diagrams here:

  http://tug.org/PSTricks/main.cgi?file=pst-node/psmatrix/psmatrix#flowchart

and there are various other online resources for using PSTricks.

Keep in mind that since this is PostScript based, you need to use a latex + 
dvips + ps2pdf sequence, rather than just pdflatex.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using variables from different environments in one function

Hi Rik,

I don't fully understand it, but setting local=TRUE to your source()
calls will avoid the specific error you noted.

Best,
Ista

On Fri, Oct 5, 2012 at 3:47 PM, Rik Verdonck
rik.verdo...@bio.kuleuven.be wrote:
 Dear R-community,


 I have been experiencing this issue with functions within a function and
 I can kind of feel where the problem is (environments, closures etc),
 but I don't manage to solve it.

 I will sketch it with an example:

 #We have two simple inner functions:

 ## INNER FUNCTION1 #
 innerfunction1-function()
 {
 ab-a+b
 cd-c+d
 innerfunctionlist-list(ab=ab,cd=cd)
 }
 

 ## INNER FUNCTION2 #
 innerfunction2-function()
 {
 print(AB+CD)
 }
 

 #And we have a main script with a number of predefined variables.
 #This shapes an environment in which variables that will be used
 #by the inner function are defined.
 #   MAIN SCRIPT 
 a=1
 b=2
 c=3
 d=4


 #source(filepath/to/innerfunction1)
 outcome1-innerfunction1()

 AB  -outcome1$ab
 CD  -outcome1$cd

 #source(filepath/to/innerfunction2)
 innerfunction2()


 #



 #So far so good. No problem if you run this.
 #The problem emerges if you want to make the main script a function.
 #So now I wrap it into a function:


 main-function()
 {
 a=1
 b=2
 c=3
 d=4

 #source(filepath/to/innerfunction1)
 outcome1-innerfunction1()

 AB -outcome1$ab
 CD -outcome1$cd

 #source(filepath/to/innerfunction2)
 innerfunction2()
 }



 when I now run main() in an environment where all these variables are not 
 defined, I will get this error
 message:

 Error in innerfunction1() : object 'a' not found

 I can solve this by defining the variables a, b, c and d. However, I
 will then get next error message:

 Error in print(AB + CD) : object 'AB' not found


 I think I see what happens. The right variables are not present
 in the right environment. Probably I did something silly, but I really
 don't see how to solve this. Can someone please help me?

 Many thanks!
 Rik Verdonck
 University of Leuven

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop for column substraction of a matrix

Hi,

Sorry, I think I misunderstand your question (after reading Rui's solution).

You can also try any of these to get the result if this is what you meant:
set.seed(1)
mat1-matrix(sample(1:500,380,replace=TRUE),ncol=38)
res1-t(do.call(rbind,lapply(1:ncol(mat1), function(i) 
mat1[,i]-apply(mat1,1,sum
#or
res1-sapply(1:ncol(mat1), function(i) mat1[,i]-apply(mat1,1,sum))
#or
res1-mapply(FUN=function(i) mat1[,i]-rowSums(mat1), 1:ncol(mat1))
A.K  





- Original Message -
From: eliza botto eliza_bo...@hotmail.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Friday, October 5, 2012 4:05 PM
Subject: [R] loop for column substraction of a matrix


Dear useRs,


I have a matrix with 38 columns and 365 rows. what i want to do is the 
following.

1. subtracting from each column, first itself and then all the remaining 
columns. More precisely, from column number 1, i will first subtract 
itself(column 1) and then the remaining 37 columns. Afterwards i will take 
column number 2 and do the same. In this way i want to proceed till 38th 
column. 
2. i want to print the resulting matrix on A4 paper.

Bundle of thanks in advance.

best regards
eliza


                          
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dúvida função Anova pacote car - Medidas repetidas

2012-10-05 Thread John Fox

Dear Diego,

This is close enough to Spanish for me to understand it (I think).

Using Anova() in the car package for repeated-measures designs requires a
multivariate linear model for all of the responses, which in turn requires
that the data set be in wide format, with each response as a variable. In
your case, there are two crossed within-subjects factors and no
between-subjects factors. If this understanding is correct (but see below),
then you could proceed as follows, where the crucial step is reshaping the
data from long to wide:

- snip --

Pa2$type.day - with(Pa2, paste(Type, Day, sep=.))
(Wide - reshape(Pa2, direction=wide, v.names=logbiovolume,
idvar=Replicate, timevar=type.day, drop=c(Type, Day)))

day - ordered(rep(c(0, 2, 4), each=2))
type - factor(rep(c(c, t), 3))
(idata - data.frame(day, type))

mod - lm(cbind(logbiovolume.c.0, logbiovolume.t.0, logbiovolume.c.2,
logbiovolume.t.2, logbiovolume.c.4, logbiovolume.t.4) ~ 1, data=Wide)

Anova(mod, idata=idata, idesign=~day*type)

- snip --

This serves to analyze the data that you showed; you'll have to adapt it for
the full data set.

I'm assuming that the replicates are independent units, and that the
design is therefore entirely within replicate. If that's wrong, then the
analysis I've suggested is also incorrect.

I hope this helps,
 John

---
John Fox
Senator McMaster Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Diego Pujoni
 Sent: Friday, October 05, 2012 9:57 AM
 To: r-help@r-project.org
 Subject: [R] Dúvida função Anova pacote car - Medidas repetidas
 
 Ola pessoal, estou realizando uma ANOVA com medidas repetidas e estou
 utilizando a fungco Anova do pacote car.
 
 Medi o biovolume de algas a cada dois dias durante 10 dias (no banco de
 dados abaixo ss coloquei ati o 40 dia). Tenho 2 tratamentos (c,t) e
 o
 experimento foi realizado em triplicas (A,B,C).
 
  Pa2
Day Type Replicate logbiovolume
 10c A19.34
 20c B18.27
 30c C18.56
 40t A18.41
 50t B18.68
 60t C18.86
 72c A18.81
 82c B18.84
 92c C18.52
 10   2t A18.29
 11   2t B17.91
 12   2t C17.67
 13   4c A19.16
 14   4c B18.85
 15   4c C19.36
 16   4t A19.05
 17   4t B19.09
 18   4t C18.26
 .
 .
 .
 
 Pa2.teste = within(Pa2,{group = factor(Type)
time = factor(Day)
id = factor(Replicate)})
 matrix =
 with(Pa2.teste,cbind(Pa2[,VAR][group==c],Pa2[,VAR][group==t]))
 matrix
[,1]  [,2]
  [1,] 19.34 18.41
  [2,] 18.27 18.68
  [3,] 18.56 18.86
  [4,] 18.81 18.29
  [5,] 18.84 17.91
  [6,] 18.52 17.67
  [7,] 19.16 19.05
  [8,] 18.85 19.09
  [9,] 19.36 18.26
 [10,] 19.63 18.96
 [11,] 19.94 18.06
 [12,] 19.54 18.37
 [13,] 19.98 17.96
 [14,] 20.99 17.93
 [15,] 20.45 17.74
 [16,] 21.12 17.60
 [17,] 21.66 17.33
 [18,] 21.51 18.12
  model - lm(matrix ~ 1)
  design - factor(c(c,t))
 
  options(contrasts=c(contr.sum, contr.poly))
  aov - Anova(model, idata=data.frame(design), idesign=~design,
 type=III)
  summary(aov, multivariate=F)
 
 Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
 
  SS num Df Error SS den Df FPr(F)
 (Intercept) 12951.2  1   6.3312 17 34775.336  2.2e-16 ***
 design 19.1  1  17.3901 1718.697 0.0004606 ***
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 
 O problema i que eu acho que esta fungco nco esta levando em
 consideragco
 os dias, nem as riplicas. Como fago para introduzir isto na analise.
 Vocjs
 conhecem alguma fungco correspondente nco paramitrica para este teste?
 Tipo
 um teste de Friedman com dois grupos (tratamento e riplica) e um bloco
 (tempo)?
 
 Muito Obrigado
 
Diego PJ
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop for column substraction of a matrix

2012-10-05 Thread eliza botto

Dear Arun and Barradas,
millions of thanks. you guys always rock.
regards
eliza

 Date: Fri, 5 Oct 2012 14:41:38 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] loop for column substraction of a matrix
 To: eliza_bo...@hotmail.com
 CC: ruipbarra...@sapo.pt; r-help@r-project.org

 Hi,

 Sorry, I think I misunderstand your question (after reading Rui's solution).

 You can also try any of these to get the result if this is what you meant:
 set.seed(1)
 mat1-matrix(sample(1:500,380,replace=TRUE),ncol=38)
 res1-t(do.call(rbind,lapply(1:ncol(mat1), function(i) 
 mat1[,i]-apply(mat1,1,sum
 #or
 res1-sapply(1:ncol(mat1), function(i) mat1[,i]-apply(mat1,1,sum))
 #or
 res1-mapply(FUN=function(i) mat1[,i]-rowSums(mat1), 1:ncol(mat1))
 A.K  

 - Original Message -
 From: eliza botto eliza_bo...@hotmail.com
 To: r-help@r-project.org r-help@r-project.org
 Cc: 
 Sent: Friday, October 5, 2012 4:05 PM
 Subject: [R] loop for column substraction of a matrix

 Dear useRs,

 I have a matrix with 38 columns and 365 rows. what i want to do is the 
 following.

 1. subtracting from each column, first itself and then all the remaining 
 columns. More precisely, from column number 1, i will first subtract 
 itself(column 1) and then the remaining 37 columns. Afterwards i will take 
 column number 2 and do the same. In this way i want to proceed till 38th 
 column. 
 2. i want to print the resulting matrix on A4 paper.

 Bundle of thanks in advance.

 best regards
 eliza

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop for column substraction of a matrix