Re: [R] nth root of matrix

2013-07-02 Thread Spencer Graves



On 7/2/2013 9:24 PM, David Winsemius wrote:

On Jul 2, 2013, at 8:11 PM, Sachinthaka Abeywardana wrote:


Hi all,

I want to do the following:

a=matrix(c(-1,-2,-3))
a^(1/3) #get 3rd root of numbers[,1]

[1,]  NaN
[2,]  NaN
[3,]  NaN


All I get is NaNs, what is the proper way of doing this? Would like to
retain the fact that it is a matrix if possible (not a requirement
though).

?complex

  a=matrix(c(-1+0i,-2+0i,-3+0i))



  I tried that.  The problem is that there are 3 different cube 
roots in the complex plane, and a^(1/3) only gives one of them.  See 
Wikipedia, "roots of unity" or the examples in the help file for 
"newton_raphson {elliptic}".



  I assume that Sachinthaka wants the real roots.  Try the following:


n <- 3 # n must be an odd integer for this to work
a=matrix(c(-1,-2,-3))
as <- sign(a)
ab <- abs(a)
cr <- as*(ab^(1/n))
> cr
  [,1]
[1,] -1.00
[2,] -1.259921
[3,] -1.442250
cr^n


  Hope this helps.  Spencer Graves


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] advice on big panel operations with mclapply?

2013-07-02 Thread ivo welch
dear R experts:  I have a very large panel data set, about 2-8GB.  think

NU <- 3;NT <- 3000
ds <- data.frame( unit= rep(1:NU, each=NT ), time=NA,  x=NA)
ds$time <- rep( 1:NT, NU )
ds$x <- rnorm(nrow(ds))

I want to do a couple of operations within each unit first, and then
do some list operations at each time.  not difficult in principle.
think

  ds <- merge back in results of  mclapply( split(1:nrow(x), ds$unit),
function( ids ) { work on ds[ids,] } )  # same unit
  ds <- merge back in results of  mclapply( split(1:nrow(x), ds$time),
function( ids ) { work on ds[ids,] } )  # same time

the problem is that ds is big.  I can store 1 copy, but not 4.  what I
really want is to declare ds "read-only shared memory" before the
mclapply() and have the spawned processes access the same ds.  right
now, each core wants its own private duplicate of ds, which then runs
out of memory.  I don't think shared data is possible in R across
mclapply.

* I could just run my code single-threaded.  this loses the
parallelism of the task, but the code remains parsimonious and the
memory footprint is still ok.

* I could just throw 120GB of SSD as swapfile.  for $100 or so, this
ain't a bad solution.  its slower than RAM but faster and safer than
coding more complex R solutions.  it's still likely faster than
single-threaded operations on quad-core machines.  if the swap
algorithm is efficient, it shouldn't be so bad.

* I could pre-split the data before and merge after the mclapply.
within each chunk, I could then use mclapply.  the code would be
uglier and have a layer of extra complexity ( = bugs ), but RAM
consumption drops by orders of magnitude.  I am thinking something
roughly like

## first operation
mclapply( split(1:nrow(x), ds$units), function(di) save( ds[di,],
file=paste0("@", di, ".Rdata") )
rm(ds)  ## make space for the mclapply
results <- mclapply( Sys.glob("*.Rdata"), function( ids ) { load(ids);
...do whatever... }  ## run many many happysmall-mem processes
system("rm @*.Rdata")  ## temporary files
load("ds.Rdata")  ## since we deleted it, we have to reload the original data
## combine results of the full ds
ds <- data.frame( ds, results )
## now run the second operation on the time units

* I could dump the data into a data base, but then every access (like
the split() or the mclapply()) would also have to query and reload the
data again, just like my .Rdata files.  is it really faster/better
than abusing the file system and R's native file formats?  I doubt it,
but I don't know for sure.

this is a reasonably common problem with large data sets.  I saw some
specific solutions on stackoverflow, a couple requiring even less
parsimonious user code.  is everyone using bigmemory?  or SQL? or ...
?  I am leaning towards the SSD solution.  am I overlooking some
simpler recommended solution?

/iaw


Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nth root of matrix

2013-07-02 Thread David Winsemius

On Jul 2, 2013, at 8:11 PM, Sachinthaka Abeywardana wrote:

> Hi all,
> 
> I want to do the following:
> 
> a=matrix(c(-1,-2,-3))
> a^(1/3) #get 3rd root of numbers[,1]
> 
> [1,]  NaN
> [2,]  NaN
> [3,]  NaN
> 
> 
> All I get is NaNs, what is the proper way of doing this? Would like to
> retain the fact that it is a matrix if possible (not a requirement
> though).

?complex

 a=matrix(c(-1+0i,-2+0i,-3+0i))


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nth root of matrix

2013-07-02 Thread Sachinthaka Abeywardana
Hi all,

I want to do the following:

a=matrix(c(-1,-2,-3))
a^(1/3) #get 3rd root of numbers[,1]

[1,]  NaN
[2,]  NaN
[3,]  NaN


All I get is NaNs, what is the proper way of doing this? Would like to
retain the fact that it is a matrix if possible (not a requirement
though).


Thanks,

Sachin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing legend to fill colour in ggplot

2013-07-02 Thread Suparna Mitra
Thanks a lot Biran,
  This is exactly what I was looking for. Thanks a ton.
B
est wishes,
Mitra



On 29 June 2013 02:51, Brian Diggs  wrote:

> On 6/27/2013 12:34 AM, Suparna Mitra wrote:
> > Hello R experts,
> >I am having a problem to edit legend in ggplot using four variables.
> >
> > My data structure is :
> > str(df)
> > 'data.frame': 10 obs. of  6 variables:
> >   $ id: Factor w/ 2 levels "639A","640": 1 1 1 1
> 1 2
> > 2 2 2 2
> >   $ species   : Factor w/ 5 levels
> "acinetobacter_sp",..: 2
> > 5 1 4 3 2 5 1 4 3
> >   $ genome_coverage_bp: int  8196 3405 8625 22568 2128 6100 1841
> > 3914 8487 1064
> >   $ genome_length : int  3571237 2541445 3912725 3479613
> 5460977
> > 3571237 2541445 3912725 3479613 5460977
> >   $ genome_coverage_percentage: Factor w/ 10 levels "0.02%","0.04%",..:
> 8 5
> > 7 10 2 6 3 4 9 1
> >   $ avg_depth_coverage: num  121.96 2.81 19.84 399.63 1.64 ...
> >
> >
> > Now what I did is
> >
> p=ggplot(df,aes(genome_coverage_percentage,avg_depth_coverage))+geom_point(aes(colour
> > = species,shape = factor(id)))
> > p+scale_shape_discrete(name  ="",labels=c("Patient 1", "Patient 2"))
> > That creats the plot below.
> > But I want to change the circles of legend in fill colour. So that it
> > doesn't look like it is only from Patient 1, as that also has circle.
> > Can anybody help me please?
>
> You can change the default aesthetics displayed in the legend using the
> override.aes argument to guide_legend.
>
> scale_colour_discrete(guide=guide_legend(override.aes=aes(shape=15)))
>
> Also, when giving data, especially a data set this small, give the
> output of dput(df) as that gives the complete data in a format that can
> be used to recreate it exactly in someoneelse's session. If I had that,
> I would test this to make sure it looks right.
>
> > Thanks a lot in advance :)
> > Mitra
> >
> >
> >
>
>
> --
> Brian S. Diggs, PhD
> Senior Research Associate, Department of Surgery
> Oregon Health & Science University
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multinomial model and p-values

2013-07-02 Thread Duncan Mackay

Hi Luciano

There are a number of types of ordinal regression and you need to be 
specific about that.

There are a number of ordinal packages to do ordinal regression.

MASS::polr
arm::bayespolr
ordinal
VGAM
repolr
geepack
etc

Each of them has specific requirements about coding of the variables 
and these MUST be adhered to.


polr may be better suited to your dataset. -- No data: cannot say.

Regards

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

At 05:21 3/07/2013, you wrote:

Hello everyone,

I have a dataset which consists of "Pathology scores" (Absent, Mild, Severe)
as outcome variable, and two main effects: Age (two factors: twenty / thirty
days old) and Treatment Group (four factors: infected without ATB; infected
+ ATB1; infected + ATB2; infected + ATB3).

First I tried to fit an ordinal regression model, which seems more
appropriate given the characteristics of my dependent variable (ordinal).
However, the assumption of odds proportionality was severely violated
(graphically), which prompted me to use a multinomial model instead, using
the "nnet" package.


First I chose the outcome level that I need to use as baseline category:

Data$Path <- relevel(Data$Path, ref = "Absent")

Then, I needed to set baseline categories for the independent variables:

Data$Age <- relevel(Data$Age, ref = "Twenty")
Data$Treat <- relevel(Data$Treat, ref = "infected without ATB")

The model:

test <- multinom(Path ~ Treat + Age, data = Data)
# weights:  18 (10 variable)
initial  value 128.537638
iter  10 value 80.623608
final  value 80.619911
converged

> summary1 <- summary(test1)
> summary1

Call:
multinom(formula = Jej_fact ~ Treat + Age, data = Data)

Coefficients:
 (Intercept)   infected+ATB1   infected+ATB2   infected+ATB3
AgeThirty
Moderate   -2.238106   -1.1738540  -1.709608   -1.599301
2.684677
Severe -1.544361   -0.8696531  -2.991307   -1.506709
1.810771

Std. Errors:
 (Intercept)infected+ATB1   infected+ATB2   infected+ATB3
AgeThirty
Moderate   0.78800460.8430368   0.7731359   0.7718480
0.8150993
Severe 0.61109030.7574311   1.1486203   0.7504781
0.6607360

Residual Deviance: 161.2398
AIC: 181.2398

For a while, I could not find a way to get the p-values for the model and
estimates when using nnet:multinom. Yesterday I came across a post where the
author put forward a similar issue regarding estimation of p-values for
coefficients
(http://stats.stackexchange.com/questions/9715/how-to-set-up-and-estimate-a-
multinomial-logit-model-in-r).

There, one blogger suggested that getting p-values from the summary() result
of multinom() is pretty easy, by first getting the t values as follows:

pt(abs(summary1$coefficients / summary1$standard.errors), df=nrow(Data)-10,
lower=FALSE)

 (Intercept)   infected+ATB1   infected+ATB2   infected+ATB3
AgeThirty
Moderate 0.002670340   0.08325396  0.014506395 0.02025858
0.0006587898
Severe   0.006433581   0.12665278  0.005216581 0.02352202
0.0035612114

I AM NOT a statistician, so don't be baffled by a silly question! I this
procedure correct?

Cheers for now,

Luciano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Deviance explained by individual terms in GAM

2013-07-02 Thread garth
Hello,

I am fitting GAMs using mgcv. My models have both linear and functional
effects, for instance:

b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3) + x4 + as.factor(x5),data=dat)

I would like to extract the proportion of deviance explained by a single
term in the model, for instance, s(x1). 
Having read through some R help posts I realize that if all terms in the
model are functional (i.e. ‘smooth’) that this is achievable by calculating
the difference in deviance explained between models with s(x1), and without
it, provided that the model parameters in both models are held fixed (i.e.
are identical). For instance, if all model covariates are functional this
would be achieved (according to a post by Simon Wood;
http://r.789695.n4.nabble.com/Re-variance-explained-by-each-predictor-in-GAM-td896222.html#a896223)
with:

dat <- gamSim(1,n=400,dist="normal",scale=2) 
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat) 
b2<-gam(y~s(x0)+s(x1)+s(x3),sp=b$sp[-3],data=dat) 
summary(b2)$dev.expl 
summary(b)$dev.expl

This works great, and the “sp=” option in the above code holds the smoothing
parameters fixed. I’m looking to extend this so as to also hold any
non-smooth parameters fixed. Does anyone know how/if this would be possible?

Thanks so much.




--
View this message in context: 
http://r.789695.n4.nabble.com/Deviance-explained-by-individual-terms-in-GAM-tp4670756.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] acf question...

2013-07-02 Thread Rolf Turner
On 03/07/13 04:21, Venkatesh Nagarajan wrote:
> I am trying to understand lagged correlations.
>   
> x= 1:100;
> y = c(rep(NA,40), 1:60)ccf(x = x, y = y, lag.max=100, na.action=na.pass, type 
> = "correlation")
>   
> I was hoping to see max cor at lag = 40. But I am not. What am I doing wrong?
Well, I would have expected the correlation to be equal to 1, at any 
(meaningful) lag.
Essentially the idea is cor(x,x+a) = 1 for any constant a.

Experimenting with cor(...,use="pair") and various lags would seem to 
bear this out.

Note that the help for acf/ccf says:
> The lag |k| value returned by |ccf(x, y)| estimates the correlation 
> between |x[t+k]| and |y[t]|. 

E.g. lag 40 (cor(x[t+40],y[t]):

 cor(x[41:140],y,use="pair") # Yields 1.

E.g. lag -40 (cor(x[t-40],y[t]):

 cor(c(rep(NA,40),x[1:60]),y,use="pair") # Yields 1.

So I am mystified by the output of ccf().

Perhaps someone would care to explain .

Or perhaps not. :-)

 cheers,

 Rolf Turner

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] acf question...

2013-07-02 Thread Jeff Newmiller
Then you are posting in the wrong forum, since this is a forum about getting R 
to do things for which you already understand the theory.

As to the results you are getting, I highly recommend reading the details 
section of ?ccf.

BTW The Posting Guide indicates that you should post in text format because 
HTML mutilates R code. This is a setting in your email program.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Venkatesh Nagarajan  wrote:

>I am trying to understand lagged correlations. 
>�
>x= 1:100; 
>y = c(rep(NA,40), 1:60)ccf(x = x, y = y, lag.max=100,
>na.action=na.pass, type = "correlation") 
>�
>I was hoping to see max cor at lag = 40. But I am not. What am I doing
>wrong?
>�
>Thanks
>VN
>   [[alternative HTML version deleted]]
>
>
>
>
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recoding variables based on reference values in data frame

2013-07-02 Thread Rui Barradas

Hello,

If you have read in the data as factors (stringsAsFactors = TRUE, the 
default), change the function to the following.



fun <- function(x){
x[x %in% c("nc", "_")] <- NA
MM <- paste0(as.character(x[1]), as.character(x[1]))  # Major Major
Mm <- paste0(as.character(x[1]), as.character(x[2]))  # Major minor
mm <- paste0(as.character(x[2]), as.character(x[2]))  # minor minor
x[x == MM] <- 0
x[x == Mm] <- 1
x[x == mm] <- 2
x
}



Rui Barradas

Em 02-07-2013 22:15, Rui Barradas escreveu:

Hello,

I'm not sure I understood, but try the following.


Kgeno <- read.table(text = "
SNP_ID SNP1 SNP2 SNP3 SNP4
Maj_Allele C G  C  A
Min_Allele T A T G
ID1 CC GG CT AA
ID2 CC GG CC AA
ID3 CC GGncAA
ID4 _ _ _ _
ID5 CC GG CC AA
ID6 CC GG CC AA
ID7 CC GG CT AA
ID8 _ _ _ _
ID9 CT GG CC AG
ID10 CC GG CC AA
ID11 CC GG CT AA
ID12 _ _ _ _
ID13 CC GG CC AA
", header = TRUE, stringsAsFactors = FALSE)

dat

fun <- function(x){
 x[x %in% c("nc", "_")] <- NA
 MM <- paste0(x[1], x[1])  # Major Major
 Mm <- paste0(x[1], x[2])  # Major minor
 mm <- paste0(x[2], x[2])  # minor minor
 x[x == MM] <- 0
 x[x == Mm] <- 1
 x[x == mm] <- 2
 x
}

Kgeno[, -1] <- sapply(Kgeno[, -1], fun)
Kgeno


Also, the best way to post data is by using ?dput.

dput(head(Kgeno[, 1:5], 30))  # post the output of this


Hope this helps,

Rui Barradas

Em 02-07-2013 21:46, kathleen askland escreveu:

I'm new to R (previously used SAS primarily) and I have a genetics data
frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3,
...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of
the data is shown below:
   SNP_ID SNP1 SNP2 SNP3 SNP4  Maj_Allele C G  C  A  Min_Allele T A T
G  ID1
CC GG CT AA  ID2 CC GG CC AA  ID3 CC GG
nc
AA  ID4 _ _ _ _  ID5 CC GG CC AA  ID6 CC
GG CC
  AA  ID7 CC GG CT AA  ID8 _ _ _ _  ID9 CT GG
CC AG  ID10 CC GG CC AA  ID11 CC GG CT AA
   ID12 _ _ _ _  ID13 CC GG CC AA
The name of the data file is Kgeno.
What I would like to do is recode all of the genotype values to standard
integer notation, based on their values relative to the reference rows
(Maj_Allele and Min_Allele). Standard notation sums the total of minor
alleles in the genotype, so values can be 0, 1 or 2.

Here are the changes I want to make:
1. If the genotype= "nc" or '_" then set equal to NA.
2. If genotype value = a character string comprised of two consecutive
major allele values -- c(Maj_Allele, Maj_Allele) -- then set equal to 0.
3. If genotype  value= c(Maj_Allele, Min_Allele) then set equal to 1.
4. If genotype  value = c(Min_Allele, Min_Allele) then set equal to 2.

I've tried the following ifelse processing but get error (Warning:
Executed
script did not end with R session at the top-level prompt.  Top-level
state
will be restored) and can't seem to fix the code properly. I've
counted the
parentheses. Also, not sure if it would execute properly if I could
fix it.

# change 'nc' and '_' to NA, else leave as is:
Kgeno[,2] <- ifelse(Kgeno[,2] == "nc", "NA", Kgeno[,2])
Kgeno[,2] <- ifelse(Kgeno[,2] == "_", "NA", Kgeno[,2])

#convert genotype strings in the first data column to numeric values
#(two
major alleles=0, 1 minor and 1 major=1, 2 minor alleles=2), else
#leave as
is (to preserve NA values).

Kgeno[,2] <-

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[1,2]), as.character(
Kgeno[1,2]), sep=""), 0,

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[1,2]), as.character(
Kgeno[2,2]), sep=""), 1,

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[2,2]), as.character(
Kgeno[2,2]), sep=""), 2,
 Kgeno[,2])))


Finally, if above code were corrected, this would only change the first
column of data, but I would like to change all 3000+ columns in the same
way.

I would greatly appreciate some suggestions on how to proceed.

Thank you,

Kathleen

---
Kathleen Askland, MD
Assistant Professor
Department of Psychiatry & Human Behavior
The Warren Alpert School of Medicine
Brown University/Butler Hospital

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman

Re: [R] Recoding variables based on reference values in data frame

2013-07-02 Thread Rui Barradas

Hello,

I'm not sure I understood, but try the following.


Kgeno <- read.table(text = "
SNP_ID SNP1 SNP2 SNP3 SNP4
Maj_Allele C G  C  A
Min_Allele T A T G
ID1 CC GG CT AA
ID2 CC GG CC AA
ID3 CC GGncAA
ID4 _ _ _ _
ID5 CC GG CC AA
ID6 CC GG CC AA
ID7 CC GG CT AA
ID8 _ _ _ _
ID9 CT GG CC AG
ID10 CC GG CC AA
ID11 CC GG CT AA
ID12 _ _ _ _
ID13 CC GG CC AA
", header = TRUE, stringsAsFactors = FALSE)

dat

fun <- function(x){
x[x %in% c("nc", "_")] <- NA
MM <- paste0(x[1], x[1])  # Major Major
Mm <- paste0(x[1], x[2])  # Major minor
mm <- paste0(x[2], x[2])  # minor minor
x[x == MM] <- 0
x[x == Mm] <- 1
x[x == mm] <- 2
x
}

Kgeno[, -1] <- sapply(Kgeno[, -1], fun)
Kgeno


Also, the best way to post data is by using ?dput.

dput(head(Kgeno[, 1:5], 30))  # post the output of this


Hope this helps,

Rui Barradas

Em 02-07-2013 21:46, kathleen askland escreveu:

I'm new to R (previously used SAS primarily) and I have a genetics data
frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3,
...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of
the data is shown below:
   SNP_ID SNP1 SNP2 SNP3 SNP4  Maj_Allele C G  C  A  Min_Allele T A T G  ID1
CC GG CT AA  ID2 CC GG CC AA  ID3 CC GG
nc
AA  ID4 _ _ _ _  ID5 CC GG CC AA  ID6 CC GG CC
  AA  ID7 CC GG CT AA  ID8 _ _ _ _  ID9 CT GG
CC AG  ID10 CC GG CC AA  ID11 CC GG CT AA
   ID12 _ _ _ _  ID13 CC GG CC AA
The name of the data file is Kgeno.
What I would like to do is recode all of the genotype values to standard
integer notation, based on their values relative to the reference rows
(Maj_Allele and Min_Allele). Standard notation sums the total of minor
alleles in the genotype, so values can be 0, 1 or 2.

Here are the changes I want to make:
1. If the genotype= "nc" or '_" then set equal to NA.
2. If genotype value = a character string comprised of two consecutive
major allele values -- c(Maj_Allele, Maj_Allele) -- then set equal to 0.
3. If genotype  value= c(Maj_Allele, Min_Allele) then set equal to 1.
4. If genotype  value = c(Min_Allele, Min_Allele) then set equal to 2.

I've tried the following ifelse processing but get error (Warning: Executed
script did not end with R session at the top-level prompt.  Top-level state
will be restored) and can't seem to fix the code properly. I've counted the
parentheses. Also, not sure if it would execute properly if I could fix it.

# change 'nc' and '_' to NA, else leave as is:
Kgeno[,2] <- ifelse(Kgeno[,2] == "nc", "NA", Kgeno[,2])
Kgeno[,2] <- ifelse(Kgeno[,2] == "_", "NA", Kgeno[,2])

#convert genotype strings in the first data column to numeric values #(two
major alleles=0, 1 minor and 1 major=1, 2 minor alleles=2), else #leave as
is (to preserve NA values).

Kgeno[,2] <-

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[1,2]), as.character(
Kgeno[1,2]), sep=""), 0,

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[1,2]), as.character(
Kgeno[2,2]), sep=""), 1,

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[2,2]), as.character(
Kgeno[2,2]), sep=""), 2,
 Kgeno[,2])))


Finally, if above code were corrected, this would only change the first
column of data, but I would like to change all 3000+ columns in the same
way.

I would greatly appreciate some suggestions on how to proceed.

Thank you,

Kathleen

---
Kathleen Askland, MD
Assistant Professor
Department of Psychiatry & Human Behavior
The Warren Alpert School of Medicine
Brown University/Butler Hospital

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recoding variables based on reference values in data frame

2013-07-02 Thread kathleen askland
I'm new to R (previously used SAS primarily) and I have a genetics data
frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3,
...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of
the data is shown below:
  SNP_ID SNP1 SNP2 SNP3 SNP4  Maj_Allele C G  C  A  Min_Allele T A T G  ID1
CC GG CT AA  ID2 CC GG CC AA  ID3 CC GG
nc
AA  ID4 _ _ _ _  ID5 CC GG CC AA  ID6 CC GG CC
 AA  ID7 CC GG CT AA  ID8 _ _ _ _  ID9 CT GG
CC AG  ID10 CC GG CC AA  ID11 CC GG CT AA
  ID12 _ _ _ _  ID13 CC GG CC AA
The name of the data file is Kgeno.
What I would like to do is recode all of the genotype values to standard
integer notation, based on their values relative to the reference rows
(Maj_Allele and Min_Allele). Standard notation sums the total of minor
alleles in the genotype, so values can be 0, 1 or 2.

Here are the changes I want to make:
1. If the genotype= "nc" or '_" then set equal to NA.
2. If genotype value = a character string comprised of two consecutive
major allele values -- c(Maj_Allele, Maj_Allele) -- then set equal to 0.
3. If genotype  value= c(Maj_Allele, Min_Allele) then set equal to 1.
4. If genotype  value = c(Min_Allele, Min_Allele) then set equal to 2.

I've tried the following ifelse processing but get error (Warning: Executed
script did not end with R session at the top-level prompt.  Top-level state
will be restored) and can't seem to fix the code properly. I've counted the
parentheses. Also, not sure if it would execute properly if I could fix it.

# change 'nc' and '_' to NA, else leave as is:
Kgeno[,2] <- ifelse(Kgeno[,2] == "nc", "NA", Kgeno[,2])
Kgeno[,2] <- ifelse(Kgeno[,2] == "_", "NA", Kgeno[,2])

#convert genotype strings in the first data column to numeric values #(two
major alleles=0, 1 minor and 1 major=1, 2 minor alleles=2), else #leave as
is (to preserve NA values).

Kgeno[,2] <-

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[1,2]), as.character(
Kgeno[1,2]), sep=""), 0,

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[1,2]), as.character(
Kgeno[2,2]), sep=""), 1,

ifelse(Kgeno[,2] == noquote(paste(as.character(Kgeno[2,2]), as.character(
Kgeno[2,2]), sep=""), 2,
Kgeno[,2])))


Finally, if above code were corrected, this would only change the first
column of data, but I would like to change all 3000+ columns in the same
way.

I would greatly appreciate some suggestions on how to proceed.

Thank you,

Kathleen

---
Kathleen Askland, MD
Assistant Professor
Department of Psychiatry & Human Behavior
The Warren Alpert School of Medicine
Brown University/Butler Hospital

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multinomial model and p-values

2013-07-02 Thread Luciano La Sala
Hello everyone, 

I have a dataset which consists of "Pathology scores" (Absent, Mild, Severe)
as outcome variable, and two main effects: Age (two factors: twenty / thirty
days old) and Treatment Group (four factors: infected without ATB; infected
+ ATB1; infected + ATB2; infected + ATB3).

First I tried to fit an ordinal regression model, which seems more
appropriate given the characteristics of my dependent variable (ordinal).
However, the assumption of odds proportionality was severely violated
(graphically), which prompted me to use a multinomial model instead, using
the "nnet" package.  


First I chose the outcome level that I need to use as baseline category: 

Data$Path <- relevel(Data$Path, ref = "Absent")

Then, I needed to set baseline categories for the independent variables:

Data$Age <- relevel(Data$Age, ref = "Twenty") 
Data$Treat <- relevel(Data$Treat, ref = "infected without ATB") 

The model:

test <- multinom(Path ~ Treat + Age, data = Data)
# weights:  18 (10 variable)
initial  value 128.537638 
iter  10 value 80.623608
final  value 80.619911 
converged

> summary1 <- summary(test1)
> summary1

Call:
multinom(formula = Jej_fact ~ Treat + Age, data = Data)

Coefficients:
 (Intercept)   infected+ATB1   infected+ATB2   infected+ATB3
AgeThirty
Moderate   -2.238106   -1.1738540  -1.709608   -1.599301
2.684677
Severe -1.544361   -0.8696531  -2.991307   -1.506709
1.810771

Std. Errors:
 (Intercept)infected+ATB1   infected+ATB2   infected+ATB3
AgeThirty
Moderate   0.78800460.8430368   0.7731359   0.7718480
0.8150993
Severe 0.61109030.7574311   1.1486203   0.7504781
0.6607360

Residual Deviance: 161.2398 
AIC: 181.2398

For a while, I could not find a way to get the p-values for the model and
estimates when using nnet:multinom. Yesterday I came across a post where the
author put forward a similar issue regarding estimation of p-values for
coefficients
(http://stats.stackexchange.com/questions/9715/how-to-set-up-and-estimate-a-
multinomial-logit-model-in-r).

There, one blogger suggested that getting p-values from the summary() result
of multinom() is pretty easy, by first getting the t values as follows: 

pt(abs(summary1$coefficients / summary1$standard.errors), df=nrow(Data)-10,
lower=FALSE) 

 (Intercept)   infected+ATB1   infected+ATB2   infected+ATB3
AgeThirty
Moderate 0.002670340   0.08325396  0.014506395 0.02025858
0.0006587898
Severe   0.006433581   0.12665278  0.005216581 0.02352202
0.0035612114

I AM NOT a statistician, so don't be baffled by a silly question! I this
procedure correct?  

Cheers for now, 

Luciano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What package to use to download pictures and its description from server?

2013-07-02 Thread Jeff Newmiller
Wouldn't this be highly dependent on how the pictures and descriptions were 
formatted online, as well as what you planned to do with them? I generally find 
that a web browser is quite sufficient for my needs.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

C W  wrote:

>Hi R community,
>What package would you recommend to download pictures and descriptions
>of the pictures?
>
>I have looked at package XML and RCurl so far.  Is this what everyone
>uses?
>
>Thanks,
>Mike
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spped up a function

2013-07-02 Thread David Winsemius

On Jul 2, 2013, at 10:47 AM, Santiago Guallar wrote:

> Hi,
> 
> I have written a function to assign the values of a certain variable 'wd' 
> from a dataset to another dataset. Both contain data from the same time 
> period but differ in the length of their time intervals: 'GPS' has regular 
> 10-minute intervals whereas 'xact' has irregular intervals. I attached 
> simplified text versions from write.table. You can also get a dput of 'xact' 
> in this address: http://www.megafileupload.com/en/file/431569/xact-dput.html).
> The original objects are large and the function takes almost one hour to 
> finish.
> Here's the function:
> 
> fxG= function(xact, GPS){
> l <- rep( 'A', nrow(GPS) )
> v <- unique(GPS$Ring) # the process is carried out for several individuals 
> identified by 'Ring'
> for(k in v ){
>
>df <- xact[xact$Ring == v,]

Simplified a bit , this is starting to look like a case for the split function:

>for(i in 1:nrow(GPS)){
>  if(GPS[i,]$Ring== v){# the code runs along the whole data.frame for 
> each i;

   # After doing the simplification I must ask how GPS[i,]$Ring 
could not == v ( or I)

  
> 
>   u <- df$timepos <= GPS[i,]$timepos
>  # fill vector l for each interval t from xact <= 
> each interval from GPS (take the max if there's > 1 interval)
>   l[i] <- df[max( which(u == TRUE) ),]$wd

 #perhaps tail(df[which(u), 'wd'],1)?
>   }
> }
>  }
> return(l)}
> 
This looks like it will be overwriting the l-object with every iteration of 'k'

> vwd <- fxG(xact, GPS)
> 
> 
> My question is: how can I speed up (optimize) this function?

The first thing you should do is describe in natural language what is desired 
to be done with objects: 'xact' and 'GPS' not yet described  rather than 
asking for simplification of obscure nested  for-loops with probably redundant 
assignments and extraneous conditions. Make a simple example of such objects 
and repost.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spped up a function

2013-07-02 Thread Santiago Guallar
Hi,

I have written a function to assign the values of a certain variable 'wd' from 
a dataset to another dataset. Both contain data from the same time period but 
differ in the length of their time intervals: 'GPS' has regular 10-minute 
intervals whereas 'xact' has irregular intervals. I attached simplified text 
versions from write.table. You can also get a dput of 'xact' in this address: 
http://www.megafileupload.com/en/file/431569/xact-dput.html).
The original objects are large and the function takes almost one hour to finish.
Here's the function:

fxG= function(xact, GPS){
l <- rep( 'A', nrow(GPS) )
v <- unique(GPS$Ring) # the process is carried out for several individuals 
identified by 'Ring'
for(k in 1:length(v) ){
I = v[k]
df <- xact[xact$Ring == I,]
for(i in 1:nrow(GPS)){
if(GPS[i,]$Ring== I){# the code runs along the whole data.frame for each i; 
it'd save time to make it stop with the last record of each i instead
u <- df$timepos <= GPS[i,]$timepos
# fill vector l for each interval t from xact <= each interval from GPS (take 
the max if there's > 1 interval)
l[i] <- df[max( which(u == TRUE) ),]$wd
}
}
}
return(l)}

vwd <- fxG(xact, GPS)


My question is: how can I speed up (optimize) this function?

Thank you for your help__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Outer function in R

2013-07-02 Thread Jeff Newmiller
Read the Posting Guide. The example you provide below is not reproducible [1], 
so we cannot tell what you're giving to the outer() function.

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Dzu  wrote:

>Dear members 
>
>I am trying to apply the function kl.dist (Kullback-Leibler Distance
>measure) to multiple matrixes.
>
>I tried the following : 
>
>veckldist <- Vectorize(kl.dist)
>distancematrix <- outer (matrix1,matrix2, "veckldist")
>
>
>But the code is complaining that the list of the object does not match.
>The
>lengths of my matrixes are same 
>
>How could I fix the error?
>
>Thanks
>
>
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/Outer-function-in-R-tp4670738.html
>Sent from the R help mailing list archive at Nabble.com.
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] acf question...

2013-07-02 Thread Venkatesh Nagarajan
I am trying to understand lagged correlations. 
 
x= 1:100; 
y = c(rep(NA,40), 1:60)ccf(x = x, y = y, lag.max=100, na.action=na.pass, type = 
"correlation") 
 
I was hoping to see max cor at lag = 40. But I am not. What am I doing wrong?
 
Thanks
VN
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Outer function in R

2013-07-02 Thread Dzu
Dear members 

I am trying to apply the function kl.dist (Kullback-Leibler Distance
measure) to multiple matrixes.

I tried the following : 

veckldist <- Vectorize(kl.dist)
distancematrix <- outer (matrix1,matrix2, "veckldist")


But the code is complaining that the list of the object does not match. The
lengths of my matrixes are same 

How could I fix the error?

Thanks





--
View this message in context: 
http://r.789695.n4.nabble.com/Outer-function-in-R-tp4670738.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What package to use to download pictures and its description from server?

2013-07-02 Thread C W
Hi R community,
What package would you recommend to download pictures and descriptions
of the pictures?

I have looked at package XML and RCurl so far.  Is this what everyone uses?

Thanks,
Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating DAGS with plate notation in R

2013-07-02 Thread Raghu Naik
The image did not come through as pointed out by a list member. I have
attached a pdf image file; the link is
http://stackoverflow.com/questions/3461931/software-to-draw-graphical-models-in-plate-notation
.

Cheers.

Raghu


On Tue, Jul 2, 2013 at 9:42 AM, Raghu Naik  wrote:

> I am trying to create a directed graph with plate notation (like the one
> shown below) in R.
>
> [image: The output image]
> Could someone direct to me an example code that will get me started.
>
> I could not see any reference to plate notations in igraph, qgraph
> packages though I may be wrong.
>
> The above figure made in graphviz by a poster on stackoverflow.
>
> I am not sure if this can be replicated in Rgraphviz - I was not able to
> get there.
>
> I would appreciate any help.
>
> Thanks.
>
> Raghu
>


graphics - Software to draw graphical models in plate notation - Stack Overflow _2013-07-02_11-23-35.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing of longitude and lattitude arguments to read URL in Google Maps and extract routes

2013-07-02 Thread Franckx Laurent
Dear David

Thank you for the suggestion.
The following works fine:

or_dest <- 
paste0("origin=",lat_or,",",long_or,"&destination=",lat_des,",",long_des)
url_to_google <- paste( 
"http://maps.googleapis.com/maps/api/directions/json?",or_dest,"&sensor=false&mode=transit&departure_time=1372665319",sep="";
 )
route <- url(url_to_google)

Problem solved, thus.

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: dinsdag 2 juli 2013 15:56
To: Franckx Laurent
Cc: 'r-help'
Subject: Re: [R] passing of longitude and lattitude arguments to read URL in 
Google Maps and extract routes


On Jul 2, 2013, at 1:59 AM, Franckx Laurent wrote:

> Dear all
>
> I try to use Google Maps to calculate travel times per transit between an 
> origin an destination.
>
> The guidelines for the search can be found at:
> https://developers.google.com/maps/documentation/directions/#TravelMod
> es
>
> When I submit the latitude and the longitude of the origin and destination as 
> literals, things work fine.
>
> For instance, the following code executes correctly and we obtain the
> distance and trip duration (the output of the search is in JSON format
> and is converted to an R object with fromJSON)
>
>library(rjson)
>library(gooJSON)
>route <- url('http://maps.googleapis.com/maps/api/directions/json? 
>  
> origin=51.13854,4.384575&destination=51.13156,4.387118®ion=be&sensor=false&mode=transit&departure_time=1372665319')
>route_file  <- file("route_file.json")
>L <- readLines(route,-1)
>writeLines(L, route_file)
>close(route)
>routesR_zone1_to_zone20 <- fromJSON( file = route_file )
>routesR_zone1_to_zone20$routes[[1]][[3]][[1]]$distance$value/1000
>routesR_zone1_to_zone20$routes[[1]][[3]][[1]]$duration$value/60
>
>
> However, what I am really interested in is to repeat this operation for 
> thousands of origin-destination pairs. The longitude and the latitude of the 
> origins and destinations then become variables.
>
> For instance:
>
>> lat_or
>[1] 51.13854
>> long_or
>[1] 4.384575
>> lat_des
>[1] 51.13156
>> long_des
>[1] 4.387118
>> route <- url('http://maps.googleapis.com/maps/api/directions/json?   
>>  
>> origin=lat_or,long_or&destination=lat_des,long_des®ion=be&sensor=false&mode=transit&departure_time=1372665319')
>> route_file  <- file("route_file.json") L <- readLines(route,-1)
>> writeLines(L, route_file)
>> close(route)
>> routesR_zone1_to_zone20 <- fromJSON( file = route_file )
>> routesR_zone1_to_zone20
>$routes
>list()
>
>$status
>[1] "NOT_FOUND"
>
> Thus, although the coordinates are the same as in the previous example, this 
> time, no route is found.
>
> I suppose that the problem is that, when the url is accessed, lat_or etc are 
> not "translated" in the corresponding numeric values, and that Google tries 
> to calculate the route between the literals " lat_or,long_or" and " 
> lat_des,long_des".

Yes. That seems likely. R would not interpret a text literal. I don't 
understand why you are not using the ordinary text handling function 'paste0'.

?paste0

> lat_or <- 51.13854
long_or <- 4.384575
lat_des <-  51.13156
long_des <- 4.387118

> paste0("origin=",lat_or,",",long_or,"&destination=",lat_des,",",long_d
> es)
[1] "origin=51.13854,4.384575&destination=51.13156,4.387118"


(Also tested on Mac with R 3.0.0 RC using Google JSON Data Interpreter for R, 
Version: 1.0.01.)

--
David.
>
> Does anyone have a suggestion on how to circumvent the problem?
>
>
>
>
> Laurent Franckx, PhD
> VITO NV
> Boeretang 200, 2400 MOL, Belgium
>
--

David Winsemius
Alameda, CA, USA

[http://www.vito.be/e-maildisclaimer/vito.png]


Ontdek hoe VITO de transitie naar een duurzame maatschappij op gang trekt: 
www.vito.be/duurzaamheidsverslag2012
Discover how VITO initiates the transition towards a sustainable society: 
www.vito.be/sustainabilityreport2012


VITO Disclaimer: http://www.vito.be/e-maildisclaimer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cross validation in R

2013-07-02 Thread Max Kuhn
> How do i make a loop so that the process could be repeated several time,
> producing randomly ROC curve and under ROC values?


Using the caret package

http://caret.r-forge.r-project.org/

--

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matching similar character strings

2013-07-02 Thread A M Lavezzi
Dear Arun,
please excuse me for this late reply, we had to stop working on this
temporaririly.

Let me reproduce here two examples of rows from F1 and F2 (sorry, but
with dput() I am not able to produce a clear example)

> F1_ex
Nome.azienda   Indirizzo
17 Alterego Via Edmondo De Amicis, 18

On row 17 of F1 we have a firm named ("Nome.azienda") 'Alterego' whose
address ("indirizzo") is  'Via Edmondo de Amicis, 18'

Below I reproduce the portion of F2 with information on the street
mentioned in F1_ex$Indirizzo.

> F2_ex

  CODICESTRADA   AREADICIRCOLAZIONE  NUMBER1 BARRATO1
NUMBER2 BARRATO2 SECTION
1  15620VIADE AMICIS EDMONDO 1
  5 1288
2  15620VIADE AMICIS EDMONDO 2
 34 1261
3  15620VIADE AMICIS EDMONDO 7
 17 1287
4  15620VIADE AMICIS EDMONDO36
 621264
5  15620VIADE AMICIS EDMONDO37
 371287
6  15620VIADE AMICIS EDMONDO64
 841262


Line 1 says that the portion of VIA DE AMICIS EDMONDO
("STRADA"+"AREADICIRCOLAZIONE"), with street numbers between 1 and 5
belongs to SECTION 1288 (these are census sections). ("BARRATO1" and
"BARRATO2" refer to the letter in street numbers such as 12/A, 28/D,
etc. In the present example they are empty)

Line 2 says that the portion of VIA DE AMICIS EDMONDO, with street
numbers between 2 and 34 belongs to SECTION 1261,

etc.

Our problem is to assign SECTION 1261 to 'Alterego', exploting the
information on its address. The problem is that the syntax of the
street address in F1 is different from the syntax in F2.

Hope I have clarified the issue

thanks a lot
Mario








On Fri, Jun 21, 2013 at 5:25 PM, arun  wrote:
> Dear Mario,
> I didn't find any difference between 1st and 2nd row of F2, except for the 
> last three columns.  Question is that why should F1 1st row should be merged 
> to 2nd row of F2 instead of 1st row of F2. In your previous example, you 
> mentioned about A1, A2, ... and B1, B2, etc.  Here, it is not provided.  As I 
> mentioned before, it is better to provide the output of ?dput() from a subset 
> of dataset.
> dput(head(F1,20))
>
> dput(head(F2,20))
>
> #so that there would be atleast some matching pairs within the example 
> dataset.  Also, please post it to r-help as I will be able to check only 
> after a couple of hours
> Tx.
> Arun
>
>
>
> - Original Message -
> From: Mario Lavezzi 
> To: arun 
> Cc:
> Sent: Friday, June 21, 2013 11:08 AM
> Subject: Re: [R] matching similar character strings
>
> dear Arun
> thank you very much. Let me explain the problem:
>
> Imagine that a portion of the row in F1 is:
>
> 
> F1
>
> 1) Street | J.F. Kennedy | 30
> 
>
> it means that our unit of interest (a firm) has address: J.F. Kennedy Street, 
> 30
>
>
> The F2 database contains the list of all the streets of the city, with 
> additional variables characterizing that street (Census data). The database
> contains sometimes street divided in some parts, according to the street 
> number. For example:
>
>
> Example of three rows of F2 concerning Kennedy street and Kennedy Road:
>
> F2
>
> 1) Street | Kennedy John Fitzgerald  | 1  | 20 | A12
> 2) Street | Kennedy John Fitzgerald  | 20 | 50 | A15
> 3) Road   | Kennedy John | 1  | 50 | A23
>
>
> We'd like to have an algorithm able to understand that, notwithstanding the 
> name is slightly different, element A15 should be added to row 1) of F1,
> producing an output such as:
>
> 1) Street | J.F. Kennedy | 30 | A15
>
>
> hope this clarifies the issue.
>
> thanks a lot! Mario
>
>
>
> Il 21/06/2013 15:29, arun ha scritto:
>> HI,
>> Could you dput() your example datasets and also your expected result?  The 
>> Census section is not clear.
>> A.K.
>>
>>
>>
>>
>> - Original Message -
>> From: A M Lavezzi 
>> To: r-help 
>> Cc:
>> Sent: Friday, June 21, 2013 5:56 AM
>> Subject: [R] matching similar character strings
>>
>> Hello everybody
>>
>> I have this problem: I need to match an addresses database F1 with the
>> information contained in a toponymic database F2.
>>
>> The format of F1 is given by three columns and 800 rows, with the
>> columns being:
>>
>> A1. Street/Road/Avenue
>> A2. Name
>> A3. Number
>>
>> Consider for instance Avenue J. Kennedy , 3011. In F1 this is:
>>
>> A1. Avenue
>> A2. J. Kennedy
>> A3. 3011
>>
>> The format of F2 file is instead given by 2 rows and five columns:
>>
>> B1. Street/Road/Avenu

Re: [R] Recursive partitioning on censored data

2013-07-02 Thread Vicent Giner-Bosch
On 2 July 2013 15:34, Achim Zeileis  wrote:

> On Tue, 2 Jul 2013, Vicent Giner-Bosch wrote:
>
>  I am interested in applying a "classification tree" analysis where the
>> response variable is a censored variable (survival data).
>>
>> I've discovered the package 'party' through this page:
>> http://www.statmethods.net/**advstats/cart.html.
>> However, as my sample is not
>> very big I would like to apply 'bootstrap' and use 'random forests', but
>> with my censored response variable.
>>
>> Are there any packages for that??
>>
>
> cforest() in the same package. See citation("party") for the corresponding
> references.
>
> Best,
> Z



Thank you for the reference!

I've also found a package called "randomSurvivalForest", that is mentioned
in "A review of survival trees" by Imad Bou-Hamad (doi:10.1214/09-SS047).

I don't know how to chose "the right package", if any. I am new to R. How
do you usually asses whether a package is good and reliable enough or not??

Thank you in advance.

--
vicent

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset of factors in a regression

2013-07-02 Thread David Winsemius

On Jul 1, 2013, at 9:39 PM, Ben Bolker wrote:

> Philip A. Viton  osu.edu> writes:
> 
>> suppose "state" is a variable in a dataframe containing abbreviations 
>> of the US states, as a factor. What I'd like to do is to include 
>> dummy variables for a few of the states, (say, CA and MA) among the 
>> independent variables in my regression formula. (This would be the 
>> equivalent of, creating, eg, ca<-state=="CA") and then including 
>> that). I know I can create all the necessary dummy variables by using 
>> the "outer" function on the factor and then renaming them 
>> appropriately; but is there a solution that's more direct, ie that 
>> doesn't involve a lot of new variables?
>> 
>> Thanks!
> 
>  You could use model.matrix(~state-1) and select the columns
> you want, e.g.
> 
> state <- state.abb; m <- model.matrix(~state-1)
> m[,colnames(m) %in% c("stateCA","stateMA")]
> 
> -- but this will actually create a bunch of vectors you
> want before throwing them away.
> 
> more compactly:
> 
> m <- sapply(cstates,"==",state)
> storage.mode(m) <- "numeric"
> ## or m[] <- as.numeric(m)


Couldn't this be achieved with "I"?:

lm(Y ~ I(state=="CA") + I(state=="MA") + covariates, data=dfrm)

-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing of longitude and lattitude arguments to read URL in Google Maps and extract routes

2013-07-02 Thread David Winsemius

On Jul 2, 2013, at 1:59 AM, Franckx Laurent wrote:

> Dear all
> 
> I try to use Google Maps to calculate travel times per transit between an 
> origin an destination.
> 
> The guidelines for the search can be found at: 
> https://developers.google.com/maps/documentation/directions/#TravelModes
> 
> When I submit the latitude and the longitude of the origin and destination as 
> literals, things work fine.
> 
> For instance, the following code executes correctly and we obtain the 
> distance and trip duration (the output of the search is in JSON format and is 
> converted to an R object with fromJSON)
> 
>library(rjson)
>library(gooJSON)
>route <- url('http://maps.googleapis.com/maps/api/directions/json? 
>  
> origin=51.13854,4.384575&destination=51.13156,4.387118®ion=be&sensor=false&mode=transit&departure_time=1372665319')
>route_file  <- file("route_file.json")
>L <- readLines(route,-1)
>writeLines(L, route_file)
>close(route)
>routesR_zone1_to_zone20 <- fromJSON( file = route_file )
>routesR_zone1_to_zone20$routes[[1]][[3]][[1]]$distance$value/1000
>routesR_zone1_to_zone20$routes[[1]][[3]][[1]]$duration$value/60
> 
> 
> However, what I am really interested in is to repeat this operation for 
> thousands of origin-destination pairs. The longitude and the latitude of the 
> origins and destinations then become variables.
> 
> For instance:
> 
>> lat_or
>[1] 51.13854
>> long_or
>[1] 4.384575
>> lat_des
>[1] 51.13156
>> long_des
>[1] 4.387118
>> route <- url('http://maps.googleapis.com/maps/api/directions/json?   
>>  
>> origin=lat_or,long_or&destination=lat_des,long_des®ion=be&sensor=false&mode=transit&departure_time=1372665319')
>> route_file  <- file("route_file.json")
>> L <- readLines(route,-1)
>> writeLines(L, route_file)
>> close(route)
>> routesR_zone1_to_zone20 <- fromJSON( file = route_file )
>> routesR_zone1_to_zone20
>$routes
>list()
> 
>$status
>[1] "NOT_FOUND"
> 
> Thus, although the coordinates are the same as in the previous example, this 
> time, no route is found.
> 
> I suppose that the problem is that, when the url is accessed, lat_or etc are 
> not "translated" in the corresponding numeric values, and that Google tries 
> to calculate the route between the literals " lat_or,long_or" and " 
> lat_des,long_des".

Yes. That seems likely. R would not interpret a text literal. I don't 
understand why you are not using the ordinary text handling function 'paste0'.

?paste0

> lat_or <- 51.13854
long_or <- 4.384575
lat_des <-  51.13156
long_des <- 4.387118

> paste0("origin=",lat_or,",",long_or,"&destination=",lat_des,",",long_des) 
[1] "origin=51.13854,4.384575&destination=51.13156,4.387118"


(Also tested on Mac with R 3.0.0 RC using Google JSON Data Interpreter for R, 
Version: 1.0.01.)

-- 
David.
> 
> Does anyone have a suggestion on how to circumvent the problem?
> 
> 
> 
> 
> Laurent Franckx, PhD
> VITO NV
> Boeretang 200, 2400 MOL, Belgium
> 
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] KalmanForecast (stats)

2013-07-02 Thread Giovanni Petris

If you read the 'Warning' section of the help page for KalmanLike, it 
explicitely says that

begin quote

 These functions are designed to be called from other functions
 which check the validity of the arguments passed, so very little
 checking is done.

end quote

So, (1) these functions are not intended to be used directly and, (2) writing 
other functions to call them after checking the validity of the arguments is 
_probably_ not something the typical/average R end user would be comfortable 
with. Of course, (2) just reflects my perception of the larger R users 
community.

Best,
Giovanni
 

From: Bert Gunter [gunter.ber...@gene.com]
Sent: Monday, July 01, 2013 10:25 PM
To: Giovanni Petris
Cc: Csima Gabriella; r-help@r-project.org
Subject: Re: [R] KalmanForecast (stats)

Below...

On Mon, Jul 1, 2013 at 7:24 PM, Giovanni Petris  wrote:
>
> Oops...
>
> Correction: The function KalmanForecast does exist in package stats.
>
> The references that I gave in my other reply are still valid, though. It is 
> my impression that Kalman filtering facilities in stats are not meant to be 
> used directly by the end user of R,

... and on what, pray tell, do you base **that**  strange pronouncement??

-- Bert




but their main purpose is to serve as workhorses for other model
fitting and forecasting functions (e.g., StructTS).
>
> Best,
> Giovanni
>
> 
> From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf 
> of Csima Gabriella [csim...@met.hu]
> Sent: Friday, June 28, 2013 6:27 AM
> To: r-help@r-project.org
> Subject: [R] KalmanForecast (stats)
>
> Dear List members,
>
> I would like to use the Kalman-filter program for forecasting - namely for 
> postprocessing numerical model results of 2m temperature. I have looked 
> through the help of the Kalman-filtering programs, mainly the KalmanForecast 
> and I have read about the newer packages like KFAS as well.
>
> I always uderstand and use new R programs that first I try out the 
> example(s), it makes me a base for my new program. My problem is that there 
> is no any example (with data that I can run immediately), and I do not 
> understand, or cannot imagine how - e.g. the "mod" - have to be as the input 
> of the program.
>
> Could you send me a simple example of KalmanForecast (with input data) that I 
> can run and can see how it works exactly?
>
> Cheers,
> Gabriella
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating DAGS with plate notation in R

2013-07-02 Thread Raghu Naik
I am trying to create a directed graph with plate notation (like the one
shown below) in R.

[image: The output image]
Could someone direct to me an example code that will get me started.

I could not see any reference to plate notations in igraph, qgraph packages
though I may be wrong.

The above figure made in graphviz by a poster on stackoverflow.

I am not sure if this can be replicated in Rgraphviz - I was not able to
get there.

I would appreciate any help.

Thanks.

Raghu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive partitioning on censored data

2013-07-02 Thread Achim Zeileis

On Tue, 2 Jul 2013, Vicent Giner-Bosch wrote:


I am interested in applying a "classification tree" analysis where the
response variable is a censored variable (survival data).

I've discovered the package 'party' through this page:
http://www.statmethods.net/advstats/cart.html. However, as my sample is not
very big I would like to apply 'bootstrap' and use 'random forests', but
with my censored response variable.

Are there any packages for that??


cforest() in the same package. See citation("party") for the corresponding 
references.


Best,
Z


Looking forward to your answer,


--
vicent
@vginer_upv

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing strings to numbers

2013-07-02 Thread arun
Hi,

Not sure how your data looks like.  May be this helps.
dat1<- read.table(text="
col1
AA-50
AT-34
TT-57
TT-45
TA-42
",sep="",header=TRUE,stringsAsFactors=FALSE)
vec1<-gsub("\\-.*","",dat1[,1])

vec2<- ifelse(vec1=="AA",-1,ifelse(vec1=="AT",0, ifelse(vec1=="TT",1,NA)))
library(stringr)
 abs(vec2-as.numeric(unlist( str_extract_all(dat1[,1],"[0-9]+"
#[1] 51 34 56 44 NA
A.K.





- Original Message -
From: Jeremy Ng 
To: r-help@r-project.org
Cc: 
Sent: Tuesday, July 2, 2013 8:31 AM
Subject: [R] Replacing strings to numbers

Hi guys,

I was wondering if any one is able to help me on a problem that I was stuck
with for a long time. It involves the replacement of character strings with
numbers. The character string can take on only 3 possible values, for
instance:

AA
AT
TT

I would want R to replace AT with 0. Between AA and TT, I want to compare
the frequency of either value, and then for the one which occurs more, I
want it to be replaced with 1, and the other with -1. So using the same
example, say, I have

AA - 50
AT-34
TT- 57

I would want R to substitute it in this way:
AA= -1
AT=0
TT = 1

The strings are not necessarily AA,AT, or TT.

Any ideas?

Thanks!
Jeremy

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Non-linear modelling with several variables including a categorical variable

2013-07-02 Thread Robbie Weterings
Hello everyone,

I am trying to model some data regarding a predator prey interaction
experiment (n=26). Predation rate is my response variable and I have 4
explanatory variables: predator density (1,2,3,4 5), predator size, prey
density (5,10,15,20,25,30) and prey type (3 categories). I started with
several linear models (glm) and found (as expected) that prey and predator
density were non-linear related to predation rates. If I use a log
transformation on these variables I get really nice curves and an adjusted
R2 of 0.82, but it is not really the right approach for modelling
non-linear relationships. Therefore I switched to non-linear least square
regression (nls). I have several predator-prey models based on existing
ecological literature e.g.:

model1 <- nls(rates ~ (a * prey)/(1 + b * prey), start = list(a = 0.27,b =
0.13), trace = TRUE) ### Holling's type II functional response

model2 <- nls(rates ~ (a*prey)/(1+ (b * prey) + c * (pred -1 )), start =
list(a=0.22451, b=-0.18938, c=1.06941), trace=TRUE, subset=I1) ###
Beddington-**DeAngelis functional response

These models work perfectly, but now I want to add prey type as well. In
the linear models prey type was the most important variable so I don't want
to leave it out. I understand that you can't add categorical variables in
nls, so I thought I try a generalized additive model (gam).

The problem with the gam models is that the smoothers (both spline and
loess) don't work on both variables because there are only a very
restricted number of values for prey density and predator density. I can
manage to get a model with a single variable smoothed using loess. But for
two variables it is simply not working. The spline function does not work
at all because I have so few values (5) for my variables (see model 4).

model3 <- gam(rates~ lo(pred, span=0.9)+prey) ## this one is actually
working but does not include a smoother for prey.

model4 <- gam(rates~ s(pred)+prey) ## this one gives problems:
*A term has fewer unique covariate combinations than specified maximum
degrees of freedom*

My question is: are there any other possibilities to model data with 2
non-linear related variables in which I can also include a categorical
variable. I would prefer to use nls (model2) with for example different
intercepts for each category but I'm not sure how to get this sorted, if it
is possible at all. The dataset is too small to split it up into the three
categories, moreover, one of the categories only contains 5 data points.

Any help would be really appreciated.

With kind regards,
-- 
Robbie Weterings
*Project Manager Cat Drop Thailand

** Tel: +66(0)890176087
*
65/13 Mooban Chakangrao, Naimuang Muang
Kamphaeng Phet 62000, Thailand
àÅ¢·Õè 65/13 Á.ªÒ¡Ñ§ÃÒÇ ¶¹¹ ÃÒª´íÒà¹Ô¹2 ã¹àÁ×ͧ ÍíÒàÀÍ/
ࢵ àÁ×ͧ¡íÒàྦྷྪà ¨Ñ§ËÇÑ´ ¡íÒàྦྷྪà 62000

 

*www.catdropfoundation.org* 
*www.facebook.com/catdropfoundation*
*Boorn 45, 9204 AZ, Drachten, The Netherlands*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear mixed effect model doubt:

2013-07-02 Thread Paqui GG
We have a doubt on whether we are applying a lme model correctly.



Our experimental/sampling design is as follows:



We are studying the relationship between two quantitative and continuous

variables at different sampling stations along a transect in the ocean. At each

station, we take three water samples (fReplica) and we experimentally measured

the dependent variable (Var1) for three different values of the independent one

(Var2). So, for each station (fstation) we have 3 replicates with 3 paired

observations of Var1 and Var2 each (this is all that was logistically possible).

Our data frame looks like:



Var1,Var2,fstation,fReplica

0.9 ,0.2,1,1

1.9 ,0.4,1,1

10.9,0.6,1,1

0.5, 0.3,1,2

0.9, 0.5,1,2

20.1,0.7,1,2

0.2, 0.1,1,3

1.3, 0.6,1,3   

40.1,0.9,1,3

... (for 32 stations)



We want to test whether the slopes and intercepts of the relationship between

Var1 and Var2 vary between stations.



We have been studying how to make a correct test. We believe our design is

similarly to the model that Zuur et al 2009 apply to an example of Zuur et al

2007. In their case, they used species richness as dependent variable and NAP

(the height of a sampling station compared to mean tidal level) as the

independent one. This variables were measured for nine beaches and 5 replicates

were taken for each beach. The main difference Zuur et al and our study is that

they have one point per replica and we have a "regression" of 3 points per 
replica.





We are applying this model to our data:



## Mlme <- lme(Var1~Var2, random=~1|fstation/fReplica ,data=Data)

## summary(Mlme)





but we are not sure if we are doing the correct test.

Any help or pointers to bibliography with similar analysis will be much 
appreciated.  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with heatmap & hclust

2013-07-02 Thread sara sadozai
hello i have a problem since days with the heatmap function and my own
cluster function...

i try to put it like this but it is not working :

heatmap(data, hclustfun=function(x)myclust(x))

where myclust is a simple fonction

myclust= function(data){

#D=dist(data)
hc=hclust(D)

#mclust= as.mclust(hc$merge)
#m=reorder.mclust(data)

mc=hc
mc$order=leaf

return(mc=mc)

}


and it returns Error in heatmap(data, hclustfun = function(x) myclust(x)) :
  column dendrogram ordering gave index of wrong length


Actually it is working with other cluster fonction like diana() or
pam() but not with my own function...

Any help ???

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple Comparison Kruskal-Wallis Test on a dataframe

2013-07-02 Thread Tom Oates
Hi
I have a dataframe of genomic positions something like below:

chrstartendstrand1a2a3a1b2b3b1c
2c3c1d2d3d
chr1201199522201199523-516263465358
537837276562
chr1201199584201199585-424935546255
435626216463
chr1201199603201199604-454655494157
455226216269
chr1201199608201199609-404547635450
404525226463
chr13794408637944087-1009296787476
76616584100100
chr13794411937944120-899593816776
74516310065100

I would like to use a kruskal test with multi-test correction, probably
from the agricolae package, on each genomic coordinate (i.e. row) to
calculate whether treatment groups a,b,c & d are different (where 1,2 & 3
would be replicates within the groups a,b,c, & d).
What is the best way to achieve this? I would have thought the options are:
i) use transform in plyr
ii) assign group numbers by genomic coordinate and treatment group and
perform the kruskal test on each genomic coordinate group
Does anyone have any suggestions to achieve this in the quickest way?
Thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recursive partitioning on censored data

2013-07-02 Thread Vicent Giner-Bosch
I am interested in applying a "classification tree" analysis where the
response variable is a censored variable (survival data).

I've discovered the package 'party' through this page:
http://www.statmethods.net/advstats/cart.html. However, as my sample is not
very big I would like to apply 'bootstrap' and use 'random forests', but
with my censored response variable.

Are there any packages for that??

Looking forward to your answer,


--
vicent
@vginer_upv

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing strings to numbers

2013-07-02 Thread Prof Brian Ripley

On 02/07/2013 13:31, Jeremy Ng wrote:

Hi guys,

I was wondering if any one is able to help me on a problem that I was stuck
with for a long time. It involves the replacement of character strings with
numbers. The character string can take on only 3 possible values, for
instance:

AA
AT
TT

I would want R to replace AT with 0. Between AA and TT, I want to compare
the frequency of either value, and then for the one which occurs more, I
want it to be replaced with 1, and the other with -1. So using the same
example, say, I have

AA - 50
AT-34
TT- 57

I would want R to substitute it in this way:
AA= -1
AT=0
TT = 1

The strings are not necessarily AA,AT, or TT.


If not, how are we to know which one is to be replaced by 0?  And does 
'more' mean 'greater than' or 'greater than or equal to'?


Adapt the following depending on your answers

> set.seed(1)
> x <- sample(c(rep("AA", 2), "AT", rep("TT", 3)))
> fr <- table(x)
> recode <- if(fr[1] < fr[3]) c(-1, 0, 1) else c(1, 0, -1)  # or <=
> x
[1] "AA" "TT" "AT" "TT" "AA" "TT"
> recode[match(x, names(fr))]  # or however the strings are arranged.
[1] -1  1  0  1 -1  1




Any ideas?

Thanks!
Jeremy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] passing of longitude and lattitude arguments to read URL in Google Maps and extract routes

2013-07-02 Thread Franckx Laurent
Dear all

I try to use Google Maps to calculate travel times per transit between an 
origin an destination.

The guidelines for the search can be found at: 
https://developers.google.com/maps/documentation/directions/#TravelModes

When I submit the latitude and the longitude of the origin and destination as 
literals, things work fine.

For instance, the following code executes correctly and we obtain the distance 
and trip duration (the output of the search is in JSON format and is converted 
to an R object with fromJSON)

library(rjson)
library(gooJSON)
route <- url('http://maps.googleapis.com/maps/api/directions/json?  

origin=51.13854,4.384575&destination=51.13156,4.387118®ion=be&sensor=false&mode=transit&departure_time=1372665319')
route_file  <- file("route_file.json")
L <- readLines(route,-1)
writeLines(L, route_file)
close(route)
routesR_zone1_to_zone20 <- fromJSON( file = route_file )
routesR_zone1_to_zone20$routes[[1]][[3]][[1]]$distance$value/1000
routesR_zone1_to_zone20$routes[[1]][[3]][[1]]$duration$value/60


However, what I am really interested in is to repeat this operation for 
thousands of origin-destination pairs. The longitude and the latitude of the 
origins and destinations then become variables.

For instance:

> lat_or
[1] 51.13854
> long_or
[1] 4.384575
> lat_des
[1] 51.13156
> long_des
[1] 4.387118
> route <- url('http://maps.googleapis.com/maps/api/directions/json?

origin=lat_or,long_or&destination=lat_des,long_des®ion=be&sensor=false&mode=transit&departure_time=1372665319')
> route_file  <- file("route_file.json")
> L <- readLines(route,-1)
> writeLines(L, route_file)
> close(route)
> routesR_zone1_to_zone20 <- fromJSON( file = route_file )
> routesR_zone1_to_zone20
$routes
list()

$status
[1] "NOT_FOUND"

Thus, although the coordinates are the same as in the previous example, this 
time, no route is found.

I suppose that the problem is that, when the url is accessed, lat_or etc are 
not "translated" in the corresponding numeric values, and that Google tries to 
calculate the route between the literals " lat_or,long_or" and " 
lat_des,long_des".

Does anyone have a suggestion on how to circumvent the problem?




Laurent Franckx, PhD
VITO NV
Boeretang 200, 2400 MOL, Belgium
Tel. + 32 14 33 58 22
Skype: laurent.franckx
laurent.fran...@vito.be
Visit our website: www.vito.be/english and http://www.vito.be/transport











[http://www.vito.be/e-maildisclaimer/vito.png]


Ontdek hoe VITO de transitie naar een duurzame maatschappij op gang trekt: 
www.vito.be/duurzaamheidsverslag2012
Discover how VITO initiates the transition towards a sustainable society: 
www.vito.be/sustainabilityreport2012


VITO Disclaimer: http://www.vito.be/e-maildisclaimer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] functions and matrices

2013-07-02 Thread Naser Jamil
Thanks to everyone for such nice illustrations. It will guide me for sure.


On 2 July 2013 02:56, David Winsemius  wrote:

>
> With permission I offer this exchange. Rolf and I have different notions
> of what u %*% v should mean, but the arbiter is obviously the original
> poster:
>
> Begin forwarded message:
>
> > From: David Winsemius 
> > Subject: Re: [R] functions and matrices
> > Date: July 1, 2013 6:21:09 PM PDT
> > To: Rolf Turner 
> >
> >
> > On Jul 1, 2013, at 5:09 PM, Rolf Turner wrote:
> >
> >> On 02/07/13 11:37, David Winsemius wrote:
> >>> On Jul 1, 2013, at 3:32 PM, Rolf Turner wrote:
> >>>
>  Basically R does things *numerically* and what you want to do really
>  amounts to symbolic manipulation.  Of course R could be cajoled into
>  doing it --- see fortune("Yoda") --- but probably only with a great
> deal of
>  effort and code-writing.
> 
>  OTOH you could quite easily write a function that would calculate
>  det(u%*%v)(x) for any given numerical value of x:
> 
>  foo <- function(a,b,x){
>    a1 <- apply(a,c(1,2),function(m,x){m[[1]](x)},x=x)
>    b1 <- apply(b,c(1,2),function(m,x){m[[1]](x)},x=x)
>    det(a1%*%b1)
>  }
> 
>  Then doing
> 
>    foo(u,v,2)
> >>> I would have thought that (u %*% v) would be:
> >>>
> >>>  u[1,1]( v[1,1](x) ) + u[1,2]( v[2,1](x) )   u[1,1]( v[1,2](x) ) +
> u[1,2]( v[2,2](x) )
> >>>  u[2,1]( v[1,1](x) ) + u[2,2]( v[2,1](x) )   u[2,1]( v[2,1](x) ) +
> u[2,2]( v[2,2](x) )
> >>>
> >>> (Crossing my fingers that I got the row and column conventions correct
> for matrix multiplication.)
> >>>
> >> 
> >>
> >> Not quite sure what you're getting at here.  It looks to me that you are
> >> calculating the *composition* of the functions rather than their
> *product*.
> >
> > Exactly. That is how I understood successive application of functions
> embedded in matrices . The symbol used in my differential topology course
> lo those 40 years ago was an open circle, but I assumed the OP wanted
> something along those lines to perform a composite mapping:
> >
> > compose <- function(u, v, x) matrix( c(
> >   u[1,1][[1]]( v[1,1][[1]](x) ) + u[1,2][[1]]( v[2,1][[1]](x) ) ,
> >   u[1,1][[1]]( v[1,2][[1]](x) ) + u[1,2][[1]]( v[2,2][[1]](x) ),
> >   u[2,1][[1]]( v[1,1][[1]](x) ) + u[2,2][[1]]( v[2,1][[1]](x) ),
> >   u[2,1][[1]]( v[2,1][[1]](x) ) + u[2,2][[1]]( v[2,2][[1]](x) ) ),
> 2,2,byrow=TRUE)
> >
> > compose(u,v,2)
> > [,1][,2]
> > [1,]   751332
> > [2,] 5427 1680128
> >
> > (Noting that I may have reversed the roles of u and v.)
> >
> >>
> >> I.e. you are taking the (i,j)th entry of "u%*%v" (evaluated at x) to be
> the
> >> sum over k of
> >>
> >>   u[i,k](v[k,j](x))
> >>
> >> This is not what I understood the OP to want.  I assumed he wanted the
> >> product of the function values rather than the composition of the
> functions,
> >> i.e. that he wanted the (i,j)th entry to be the sum over k of
> >>
> >>   u[i,k](x) * v[k,j](x)
> >>
> >> which is what my function provides.  This seems to me to be the most
> >> "reasonable" interpretation, but I could be wrong.
> >>
> >> BTW --- you cannot actually do u[i,k](x).  E.g.
> >>
> >>   u[1,2](2)
> >>
> >> gives "Error: attempt to apply non-function".  One needs to do
> u[1,2][[1]](2)
> >> (which gives 4, as it should).
> >
> > Yes. I was playing fast and loose with notation. I didn't think the code
> would really run as offered.I was a bit surprise that this worked, but I
> suppose you bear credit (and blame?) for pushing my program closer to
> completion.
> >
> >> v[1,1][[1]]( u[1,1][[1]]( 2 ))
> > [1] 11
> >
> > Any problem with me copying this to the list?
> >
> >
> >>
> >>   cheers,
> >>
> >>   Rolf
> >
> > Best;
> >
> >
> > David Winsemius
> > Alameda, CA, USA
> >
>
> David Winsemius
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ComBat: Error in solve.default(t(des) %*% des) : Lapack routine dgesv: system is exactly singular

2013-07-02 Thread giulia benedetti
Hello,
I have been trying to run the ComBat function for a while on my gene array
data but don't manage to make it work. I always get the following error:
Error in solve.default(t(des) %*% des) : Lapack routine dgesv: system is
exactly singular: U[34,34]=0.

I know it is an error of matrice but I do not know how to solve it.

Here are the lines I wrote:

library(“sva”)

mydata = read.table("normalized_data_ filtered_low.txt")

sampleinfo=read.table("sampleinfo.txt”, header=TRUE)

batch = sampleinfo$Batch

mod <- model.matrix(~as.factor(sampleinfo$Phenotype) +
~as.factor(sampleinfo$Status) + ~as.factor(sampleinfo$RF) +
~as.factor(sampleinfo$Responder) + sampleinfo$Age + sampleinfo$DAS +
sampleinfo$TJC28 + sampleinfo$SJC28)
ComBat(dat, batch, mod, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE)


after running it finds 8 covariates and 33 batches and I have 202 samples.


The tab files are as follows:

 sampleinfo file:


gene expression file:



Could you please tell me what could be done to solve this issue.

Giulia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Word occurrence rate in a tweet

2013-07-02 Thread Bembi Prima
Hi all,

Currently I am working on a code that will calculate word occurrence rate
in a tweet.
First, I have 'tweets' that contains all the tweet I grabbed and I make
'words' that contains all unique word in 'tweets'.
After that I use sapply to calculate probability of a word appearing in
'tweets'.
The main problems is speed, before using sapply, I use simple for loop that
takes a really long time to finish but I can make simple ETA in the loop.
After I learn to use sapply and implement it on the code, speed is
improving greatly but I don't know the ETA so I just waiting for the result
to appear.
Using just 5% of the data I have waited for hours and R is still busy with
no output.
Is there a faster solution or useful package to help on my problem?

Here is my code :

sample.num<-10

tweets<-read.csv('data_conv.csv', sep=',', header=TRUE, stringsAsFactors =
FALSE)
tweets.num<-dim(tweets)[1]
tweets<-tweets[sample(1:tweets.num,sample.num,replace=FALSE)]
tweets.num<-length(tweets)

words<-paste(tweets,collapse=' ')
words<-gsub("\\\r\\\n", " ", words,ignore.case=TRUE,perl=TRUE) # remove
newlines
words<-gsub(" *\\d+ *", " ", words,ignore.case=TRUE,perl=TRUE) # remove
digits
words<-gsub("[^\\w@]+", " ", words,ignore.case=TRUE,perl=TRUE) # remove
nonwords
words<-unique(as.data.frame(strsplit(tolower(words),split=' '))) # unique
words
words<-words[order(words),] # sort it
words<-as.character(words)
words.num<-length(words)

result<-as.data.frame(words)
result$prob<-0
result$prob<-sapply(1:words.num,function(i)sum(grepl(sprintf('\\b%s\\b',words[i]),
tweets, ignore.case = TRUE, perl = TRUE))/tweets.num) # Lng time here

Thank you,
Bembi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing each level of a factor to the global mean

2013-07-02 Thread Adams, Jean
You could subtract the mean from the response before fitting, then fit a
model without an intercept.  That would give you four parameters (one for
each level) relative to the mean ...

summary(lm(weight-mean(weight) ~ Diet -1, ChickWeight))

Jean


On Thu, Jun 27, 2013 at 5:47 PM, Shaun Jackman  wrote:

> Hi Jean,
>
> contr.treatment(4) shows what the default contrast matrix looks like
> for a factor with 4 levels. What function do I use to create a
> contrast matrix to compare each level with the global mean (four
> comparisons in total), and produce a table similar to `summary.lm`?
>
> Thanks,
> Shaun
>
>
> On 26 June 2013 05:50, Adams, Jean  wrote:
> > Shaun,
> >
> > See the help on contrasts ...
> >  ?contr.treatment
> >
> > Jean
> >
> >
> > On Tue, Jun 25, 2013 at 7:07 PM, Shaun Jackman 
> wrote:
> >>
> >> Hi,
> >>
> >> I've used `lm` to create a linear model of a continuous variable
> >> against a factor variable with four levels using an example R data set
> >> (see below). By default, it uses a treatment contrast matrix that
> >> compares each level of the factor variable with the first reference
> >> level (three comparisons in total). I'd like to compare each level
> >> with the global mean (four comparisons in total), and produce a table
> >> similar to `summary.lm`. How do I go about this?
> >>
> >> ```r
> >> model <- lm(weight ~ Diet, ChickWeight)
> >> summary(model)
> >> ```
> >>
> >> Thanks,
> >> Shaun
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replacing strings to numbers

2013-07-02 Thread Jeremy Ng
Hi guys,

I was wondering if any one is able to help me on a problem that I was stuck
with for a long time. It involves the replacement of character strings with
numbers. The character string can take on only 3 possible values, for
instance:

AA
AT
TT

I would want R to replace AT with 0. Between AA and TT, I want to compare
the frequency of either value, and then for the one which occurs more, I
want it to be replaced with 1, and the other with -1. So using the same
example, say, I have

AA - 50
AT-34
TT- 57

I would want R to substitute it in this way:
AA= -1
AT=0
TT = 1

The strings are not necessarily AA,AT, or TT.

Any ideas?

Thanks!
Jeremy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cross validation in R

2013-07-02 Thread Adams, Jean
This code is untested, since you did not provide any example data.  But it
may help you get started.

Jean

mydata <- read.csv(file.choose(), header=TRUE)
library(ROCR)  # ROC curve and assessment of my prediction

plot(0:1, 0:1, type="n", xlab="False positive rate", ylab="True positive
rate")
abline(0, 1, col="red")

nsim <- 5
auc <- rep(NA, nsim)
for(i in 1:nsim) {
select <- sample(nrow(mydata), round(nrow(mydata)*0.7))
 data70 <- mydata[select, ]  # train
data30 <- mydata[-select, ]  # test
temp.glm <- glm(Death ~ Temperature, data=data70, family=binomial)
 pred <- prediction(data30$pred, data30$Death)
perf <- performance(pred, "tpr", "fpr")
 plot(perf, add=TRUE)
auc[i] <- attributes(performance(pred, "auc"))$y.values[[1]] # area under
the ROC
 }
auc


On Tue, Jul 2, 2013 at 3:25 AM, Eddie Smith  wrote:

> Guys,
>
> I select 70% of my data and keep 30% of it for model validation.
>
> mydata <- read.csv(file.choose(), header=TRUE)
> select <- sample(nrow(mydata), nrow(mydata) * .7)
> data70 <- mydata[select,]  # select
> data30 <- mydata[-select,]  # testing
> temp.glm <- glm(Death~Temperature, data=data70,
> family=binomial(link="logit"))
>
> library(ROCR)  # ROC curve and assessment of my prediction
> pred <- prediction(data30$pred, data30$Death)
> perf <- performance(pred,"tpr","fpr")
> plot(perf); abline(0, 1, col="red")
> attributes(performance(pred, 'auc'))$y.values[[1]] # area under the ROC
>
> How do i make a loop so that the process could be repeated several time,
> producing randomly ROC curve and under ROC values?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Optimum of lm

2013-07-02 Thread S Ellison
 
>   I am trying to do some D o E. I would like to find 
> the optima
> (maxima) of a 2 dimensional lm with quadratic terms.
> I'm sure there is a really simple solution but i can't find it.
At least part of this problem is solved by the rsm package; that fits a 
quadratic response surface and provides the location of the maximum together 
with some diagnostics.

Steve Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generalized Cholesky Inverse

2013-07-02 Thread Pascal Oettli
Hi,

Good. But you probably didn't carefully read the help page.

The function used is "solve", not "solve.gchol". For an object of class gchol,
"solve" will call "solve.gchol".

Regards,
Pascal



2013/7/2 Tjun Kiat Teo 

> Yap. I did load the package "bdsmatrix"
>
>
> On Fri, Jun 21, 2013 at 8:26 AM, Pascal Oettli  wrote:
>
>> Hi,
>>
>> Did you load "bdsmatrix"?
>>
>> Regards,
>> Pascal
>>
>>
>>
>> 2013/6/21 Tjun Kiat Teo 
>>
>>> What are the possible options I have for  Generalized choleksy Inverse in
>>> R. I tried to use the package bdsmatrix and and was given the error
>>> message
>>> function solve.gchol does not exist. Thanks
>>>
>>> Tjun Kiat
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cross validation in R

2013-07-02 Thread Eddie Smith
Guys,

I select 70% of my data and keep 30% of it for model validation.

mydata <- read.csv(file.choose(), header=TRUE)
select <- sample(nrow(mydata), nrow(mydata) * .7)
data70 <- mydata[select,]  # select
data30 <- mydata[-select,]  # testing
temp.glm <- glm(Death~Temperature, data=data70,
family=binomial(link="logit"))

library(ROCR)  # ROC curve and assessment of my prediction
pred <- prediction(data30$pred, data30$Death)
perf <- performance(pred,"tpr","fpr")
plot(perf); abline(0, 1, col="red")
attributes(performance(pred, 'auc'))$y.values[[1]] # area under the ROC

How do i make a loop so that the process could be repeated several time,
producing randomly ROC curve and under ROC values?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generalized Cholesky Inverse

2013-07-02 Thread Tjun Kiat Teo
Yap. I did load the package "bdsmatrix"

On Fri, Jun 21, 2013 at 8:26 AM, Pascal Oettli  wrote:

> Hi,
>
> Did you load "bdsmatrix"?
>
> Regards,
> Pascal
>
>
>
> 2013/6/21 Tjun Kiat Teo 
>
>> What are the possible options I have for  Generalized choleksy Inverse in
>> R. I tried to use the package bdsmatrix and and was given the error
>> message
>> function solve.gchol does not exist. Thanks
>>
>> Tjun Kiat
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.