Re: [R] Question: RMySQL bulk load/update one column, dbWriteTable()?

2007-06-06 Thread Chris Stubben

> I have a question reading using RMySQL trying to load one R vector into a
> table column.  To be more specifically, the table is there populated.  Now I
> add a new column and want to populate this.
> 


Okay, this is more of an SQL question now, but you could just use dbWriteTable
and then do an multi-table update.



dbGetQuery(con, "select * from tmp")

  id name
1  1A
2  2B
3  3C
4  4D
5  5E


dbSendQuery(con, "alter table tmp add column r2 float")

## calculate some statistic for all or some ids in table


x<-dataframe(id=1:5, r2=c(.1, .4, .9, .4,.7))


dbWriteTable(con, "r2tmp",  x )


dbSendQuery(con, "update tmp t, r2tmp r set t.r2=r.r2 where t.id=r.id")


dbGetQuery(con, "select * from tmp")

  id name  r2
1  1A 0.1
2  2B 0.4
3  3C 0.9
4  4D 0.4
5  5E 0.7


Chris

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL question, sql with R vector or list

2007-06-05 Thread Chris Stubben
> dynamically, rather than statically like using paste.  One concern using
> paste to construct the SQL command is this: what about if the ID list in
> your sample becomes very large, is this a problem to construct this way?
> 


I have not messed with procedures in mysql 5, so I have no idea about the first
question.  However, the IN operator should be fast, even for large vectors.   
It takes me a second to retrieve 10,000 random records from a table with
100,000+ rows.

system.time(id.in( sample(1:10, 1) ))[3]
[1] 1.07


Also, I forgot to mention this before -  if your ID is a character field, you
can quote strings in the IN clause using the shell quote function.

paste(shQuote(ids),collapse=",")

Chris

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL question, sql with R vector or list

2007-06-05 Thread Chris Stubben

> I am trying to write a RMySQL sql script inside R such that part of the SQL
> would be R list or vector.  For example, I want to select * from Atable
> where ID would equal to a members of R list or vector of "1, 2, 3".  Here
> the ID list was generated inside R and then try to feed to mysql to call to
> get additional data sets.
> 


You could pass a comma-separated list of IDs to the sql IN operator


## in MySQL

CREATE table tmp (id int, name char(1));
insert into tmp values (1, "A"), (2, "B"), (3, "C"), (4, "D"), (5, "E");



### in R


library(RMySQL)

con <- dbConnect("MySQL",  dbname="test" )


id.in<-function(ids)
{
 dbGetQuery(con,   paste("select * from tmp 
where id IN (", paste(ids,collapse=","), ")")  ) 
}



id.in(2:4)
  id name
1  2B
2  3C
3  4D


## simple lists also work

id.in(list(1,4,5))
  id name
1  1A
2  4D
3  5E


Chris

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identifying last record in individual growth data over different time intervalls

2007-03-05 Thread Chris Stubben

> Finally I would like to have a data.frame t2 which only contains the 
> entries of the last measurements.
> 

You could also use aggregate to get the max year per plate then join that back
to the original dataframe using merge on year and plate (common columns in both
dataframes).



x<-data.frame(id=(1:8), plate=c(15,15,15,20,20,33,43,43),
year=c(2004,2005,2006,2004,2005,2004,2005,2006), 
height=c(0.40,0.43,0.44,0.90,0.94,0.15,0.30,0.38))

merge(x, aggregate(list(year=x$year), list(plate=x$plate), max))


  plate year id height
115 2006  3   0.44
220 2005  5   0.94
333 2004  6   0.15
443 2006  8   0.38

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] abbreviate dataframe for Sweave output

2007-02-08 Thread stubben
Thanks Charles and Max, both functions work great.  I also used a hack 
to replace row names with '.', '. ', '.  ', etc.

Chris

dot.df <- function(x, head = 3, tail=1, dotrows=2)
{
x <- format(rbind(head(x,head + dotrows), tail(x,tail)))
if(dotrows>0)
{
   x[(head + 1):(head + dotrows),] <- "."
   for(i in 1:dotrows){ rownames(x)[head+i]<-paste(".", substring("  
 ", 1, i-1))}
}
x
}

dot.df(crabs)
 sp sex index   FL   RW   CL   CW   BD
1B   M 1  8.1  6.7 16.1 19.0  7.0
2B   M 2  8.8  7.7 18.1 20.8  7.4
3B   M 3  9.2  7.8 19.0 22.4  7.7
..   . ......
..   . ......
200  O   F50 23.1 20.2 46.2 52.5 21.1



## or dot.matrix (replacing latex commands for display here)

dot.matrix( crabs)
 sp sex ...   CW   BD
1B   M ...   197
2B   M ... 20.8  7.4
..   .  . ..
199  O   F ... 48.7 19.8
200  O   F ... 52.5 21.1


>> foo(crabs)
> sp  sex index FL RW CL CW BD
> 1   "B" "M" " 1"  " 8.1" " 6.7" "16.1" "19.0" " 7.0"
> 2   "B" "M" " 2"  " 8.8" " 7.7" "18.1" "20.8" " 7.4"
> 3   "B" "M" " 3"  " 9.2" " 7.8" "19.0" "22.4" " 7.7"
> .   "." "." "."   "."".""."".""."
> .   "." "." "."   "."".""."".""."
> 200 "O" "F" "50"  "23.1" "20.2" "46.2" "52.5" "21.1"
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] abbreviate dataframe for Sweave output

2007-02-06 Thread stubben
I wanted to print the first and last rows of some dataframes in Sweave 
using dots in columns to separate the two parts.  Head and tail almost 
work, but I have problems with factors and row names.

z<-data.frame(id=letters[1:26], x=sample(1:26,26))

rbind(head(z,3), ".", tail(z,1))

  id  x
1 a 18
2 b  8
3 c 14
4.
26z 10
Warning message:
invalid factor level, NAs generated in...


I would like something like this if possible.  Any ideas?

  id  x
1 a 18
2 b  8
3 c 14
. .  .
. .  .
26z 10


Thanks,

Chris Stubben



-- 
-

Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to store an array in MySQL

2006-09-20 Thread Chris Stubben
 
> Hi all, does somebody know how to store an array in MySQL with the
> package RMySQL. Thanks in advance.
> 

A similar question was asked last month. 

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/81429.html

In a normalized database, you should store the index and value in separate
columns.  Try this...


a<-array(1:3, c(2, 4))
 [,1] [,2] [,3] [,4]
[1,]1321
[2,]2132


d <-dim(a)


z<-cbind(expand.grid(r=1:d[1],c=1:d[2]),x=as.vector(a))
  r c x
1 1 1 1
2 2 1 2
3 1 2 3
4 2 2 1
5 1 3 2
6 2 3 3
7 1 4 1
8 2 4 2


library(RMySQL)

con<-dbConnect(MySQL(), dbname="test")

dbWriteTable(con, "array", z, row.names=FALSE)
[1] TRUE


---

Now in Mysql 

select * from array;
+--+--+--+
| r| c| x|
+--+--+--+
|1 |1 |1 |
|2 |1 |2 |
|1 |2 |3 |
|2 |2 |1 |
|1 |3 |2 |
|2 |3 |3 |
|1 |4 |1 |
|2 |4 |2 |
+--+--+--+



select group_concat(x order by c separator ' ' ) as a from array group by r;
+-+
| a   |
+-+
| 1 3 2 1 |
| 2 1 3 2 |
+-+


Chris Stubben

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstraping for groups and subgroups and joing with other table

2006-08-30 Thread Chris Stubben
>   I have a table with following collumns: State, SamplePlot, Species and
BodySize. I sampled bird species at
> 34 SamplePlots and 5 States (regions) monthly during two years. On each bird
record I measured bodysize
> and identified the species. So I have many records of each species (about 150
species) at each SamplePlot
> and each Region (State). 
> 
>   Now I would like bootstrap these data, selecting 50 records for each
State/SamplePlot combinations and
> count how many species (richness) were sampled at bootstrap. I need to do this
1.000 times. 
> 
>   After that and need join the number of species [obtained at each bootstrap
and for each State/SamplePlot
> combination] with a dataframe that have other attributes for SamplePlot (like
Area, Perimeter etc). 



I asked a similar question earlier... IF you have data frame birds and
bird.plots, maybe something like this.

#initialize empty variable 
boot<-NULL

for(i in 1:100)
{
   ## split on state and site and create list a
   a <-split(birds, paste(birds$State,birds$SampleSite), drop=T)

   # sample 50 rows each OR by number of observations (better?)
   # b<-lapply( a, function(x) x[sample(nrow(x), 50, replace=T),])
 b<-lapply( a, function(x) x[sample(nrow(x), replace=T),])

   ## count number of unique species or other statistic? 
   ###  and add row to boot matrix
   boot<-rbind(boot, unlist( lapply(b, function (x) length(unique(x$Species)) ) 
))
}

## mean

y<-apply(boot, 2, mean)


## convert to data frame for merge

y<-data.frame(y)

names(y)<-"boot.count"


## add row names to bird.plots for easy join
rownames(bird.plots)<-paste(bird.plots$State,bird.plots$SampleSite)

merge(bird.plots,y, by=0)

  Row.names  State SampleSite Area boot.count
1   Bahia Site1  Bahia  Site1   10   1.00
2   Bahia Site2  Bahia  Site2   25   1.96
3   Bahia Site3  Bahia  Site3   70   1.72
4   Bahia Site4  Bahia  Site4   15   1.73
5   Bahia Site5  Bahia  Site55   1.42
6  RioJaneiro Site1 RioJaneiro  Site1   32   2.49
7  RioJaneiro Site2 RioJaneiro  Site2   45   1.63
8  RioJaneiro Site3 RioJaneiro  Site3   10   2.37
9SaoPaulo Site1   SaoPaulo  Site1   23   2.41
10   SaoPaulo Site2   SaoPaulo  Site2   45   2.57


Chris Stubben

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] analyze amino acid sequence (composition)of proteins

2006-06-19 Thread stubben
Have you checked the package seqinR at CRAN?  It may help.

Also, if you split a string into single characters using strsplit, then 
use table to count characters.


seq<-"ATGAAC"


table(strsplit(seq, ""))

A C G T
3 1 1 1




 > 3. based on "AAABBB",how can i get some statistics of this string 
such as how
 > many letters,how many "A"s in the string.
-- 

-
Chris Stubben

Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] In which application areas is R used?

2006-01-23 Thread stubben
Population biology

It's dominated by matlab, but I see R used more and more.

http://popstudies.stanford.edu/summer_course.html


Chris


-- 
-
Chris Stubben

Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Sample rows in data frame by subsets

2006-01-23 Thread Chris Stubben
Hi,

I need to resample rows in a data frame by subsets

L3 <- LETTERS[1:3]
d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
x  y fac
1  1  1   A
2  1  2   A
3  1  3   A
4  1  4   A
5  1  5   C
6  1  6   C
7  1  7   B
8  1  8   A
9  1  9   C
10 1 10   A

I have seen this used to sample rows with replacement

d[sample(nrow(d), replace=T), ]

 x  y fac
7   1  7   B
2   1  2   A
1   1  1   A
3   1  3   A
2.1 1  2   A
10  1 10   A
8   1  8   A
9   1  9   C
1.1 1  1   A
8.1 1  8   A


but I would like to sample based on the original number in fac

summary(d$fac)
A B C
6 1 3


rbind(subset(d, fac=="A")[sample(6, replace=T), ],
   subset(d, fac=="B")[sample(1, replace=T), ],
   subset(d, fac=="C")[sample(3, replace=T), ] )

 x  y fac
2   1  2   A
3   1  3   A
3.1 1  3   A
1   1  1   A
10  1 10   A
1.1 1  1   A
7   1  7   B
5   1  5   C
6   1  6   C
5.1 1  5   C


Is there an easy way to do this in one step or with a short function?  I 
have lots of dataframes to resample.

Thanks,

Chris


-- 
-
Chris Stubben

Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Constructing a transition matrix

2005-12-07 Thread Chris Stubben
Hi Peter,

Thanks for pointing out the set functions.  I can use setdiff to find 
missing rows

setdiff(dev, rownames(A))
[1] "seed"

and intersect to find common rows

d1<- intersect(dev, rownames(A) )
[1] "veg" "rep"

I was trying to use a negative index like A[-1,] to remove the dead row, 
but d1 is a better solution.  Now I can add the missing seed row and get 
a square matrix.

rbind( seed=numeric(3), A[d1,] )[dev,dev]


Another post by Hans Gardfjell suggested reordering factor levels before 
using prop.table(table()) and this solution works great!

trans$class <- ordered(trans$class, levels=dev)
trans$fate <- ordered(trans$fate, levels=c(dev,"dead") )


A <- t(prop.table(table(trans$class, trans$fate),1))[-4,]

 seed veg rep
   seed 0.000   0 0.0
   veg  0.667   0 0.5
   rep  0.000   1 0.5


Thanks for the help,

Chris



Peter Dalgaard wrote:

> Are you looking for something like
> 
> d1 <- setdiff(dev,"seed")
> A0[d1,dev] <- A[d1,dev]
> 
> ?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Constructing a transition matrix

2005-12-06 Thread Chris Stubben
Hi again,

I almost figured this out, but still need some help on the last part.

I can use prop.table to get survival probabilities...

A <- t(prop.table( table(trans$class, trans$fate),1) )

 rep  seed   veg
   dead 0.000 0.333 0.000
   rep  0.500 0.000 1.000
   veg  0.500 0.667 0.000


so now I just need to format the matrix.  I thought I could create a 
matrix of zeroes using size class names,

dev<- c("seed","veg", "rep").


A0<-matrix(numeric(9), nrow=3, dimnames=list(dev,dev) )
  seed veg rep
seed0   0   0
veg 0   0   0
rep 0   0   0


but how do I assign values in A to the corresponding rows and columns in 
A0?  I hope there is an easy solution that I'm overlooking.


  seed veg   rep
seed0   0 0
veg  0.67   0   0.5
rep 0   1   0.5


Thanks,

Chris



Chris Stubben wrote:
> Hi,
> 
> I would like to construct a transition matrix from a data frame with 
> annual transitions of marked plants.
> 
> plant<-c(1:6)
> class<-c("seed","seed", "seed", "veg", "rep", "rep")
> fate<-c("dead", "veg","veg","rep", "rep", "veg")
> 
> trans<-data.frame(plant, class, fate)
> 
>   plant class fate
> 1 1  seed dead
> 2 2  seed  veg
> 3 3  seed  veg
> 4 4   veg  rep
> 5 5   rep  rep
> 6 6   rep  veg
> 
> I have been using sql queries to do this, but I would like to construct 
> the matrix in R since I plan to resample transitions using 
> trans[sample(nrow(trans), 6, replace=T), ]
> 
> I know I can get the original size vector using table()
> 
> data.matrix(table(trans$class))
>  [,1]
> rep 2
> seed3
> veg 1
> 
> 
> but I don't know how to get counts of each class-fate combination where 
> fate does NOT equal dead
> 
> seed veg  = 2
> veg  rep  = 1
> rep  rep  = 1
> rep  veg  = 1
> 
> 
> or how to divide the class-fate count by the original class count in the 
> size vector to get survival probabilities
> 
> seed veg  = 2 / 3 seed = 0.67
> veg  rep  = 1 / 1 veg  = 1
> rep  rep  = 1 / 2 rep  = 0.5
> rep  veg  = 1 / 2 rep  = 0.5
> 
> 
> or construct the square matrix with rows and columns in the same 
> developmental sequence like dev<- c("seed","veg", "rep").
> 
>  seed veg   rep
> seed0   0 0
> veg  0.67   0   0.5
> rep 0   1   0.5
> 
> Any help or suggestions would be appreciated.
> Thanks,
> 
> 
> Chris Stubben
> 
> 
> -- 
> Los Alamos National Lab
> BioScience Division
> MS M888
> Los Alamos, NM 87545
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Constructing a transition matrix

2005-12-06 Thread Chris Stubben
Hi,

I would like to construct a transition matrix from a data frame with 
annual transitions of marked plants.

plant<-c(1:6)
class<-c("seed","seed", "seed", "veg", "rep", "rep")
fate<-c("dead", "veg","veg","rep", "rep", "veg")

trans<-data.frame(plant, class, fate)

   plant class fate
1 1  seed dead
2 2  seed  veg
3 3  seed  veg
4 4   veg  rep
5 5   rep  rep
6 6   rep  veg

I have been using sql queries to do this, but I would like to construct 
the matrix in R since I plan to resample transitions using 
trans[sample(nrow(trans), 6, replace=T), ]

I know I can get the original size vector using table()

data.matrix(table(trans$class))
  [,1]
rep 2
seed3
veg 1


but I don't know how to get counts of each class-fate combination where 
fate does NOT equal dead

seed veg  = 2
veg  rep  = 1
rep  rep  = 1
rep  veg  = 1


or how to divide the class-fate count by the original class count in the 
size vector to get survival probabilities

seed veg  = 2 / 3 seed = 0.67
veg  rep  = 1 / 1 veg  = 1
rep  rep  = 1 / 2 rep  = 0.5
rep  veg  = 1 / 2 rep  = 0.5


or construct the square matrix with rows and columns in the same 
developmental sequence like dev<- c("seed","veg", "rep").

  seed veg   rep
seed0   0 0
veg  0.67   0   0.5
rep 0   1   0.5

Any help or suggestions would be appreciated.
Thanks,


Chris Stubben


--
Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html