Re: [R] cmprsk- another survival-depedent package causes R crash

2009-03-30 Thread Thomas Lumley


Yes, there are other packages with incompatibilities with the new version of 
'survival'.  The package maintainers for all the packages that fail R CMD check 
have all been notified and given suggestions for how to update. You can see 
which packages fail CMD check by looking at the CRAN check results linked from 
the package's CRAN page.

If you find that the change has caused problems for a package but hasn't caused 
a CMD check failure, then it would be helpful to report it, since we might not 
know.  Otherwise I think you can assume that an update is already on someone's 
to-do list.

-thomas


On Mon, 30 Mar 2009, Nguyen Dinh Nguyen wrote:


Dear Prof Gray and everyone,

As our package developers discussed about incompatibility between Design and 
survival packages,  I faced another problem with cmprsk- a survival dependent 
packacge.
The problem is exactly similar to what happened to the Design package that when 
I just started running cuminc function, R was suddenly closed.
These incidents suggest that maybe many other survival dependent packages being 
involved the problem
Could you please consider the matter

My R version: 2.8.1
Window XP Service pack 3

Regards
Nguyen D Nguyen
Garvan Institute, Australia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sum of character vector

2009-03-30 Thread David A.G

Dear list,

I am trying to evaluate how many elements in a vector equal a certain value. 
The vectors are the columns of a data.frame, read in using read.table():

> dim(data)
[1] 2600  742
> data[1:5,1:5]
  SNP001 SNP002 SNP003 SNP004 SNP005
1 GG AA TT TT GG
2 GG AA TC TT GG
3 GG AC CC TT GG
4 AG AA TT TT GG
5 GG AA TC TT GG

> table(data[,1])

  AA   AG   GG 
 251 1093 1252 

but if I do

> sum(data[,1]=="GG")
[1] NA

I have tried storing the column it in a vector but with same results:

> yyy<-(data[,1])
> sum(yyy=="GG")
[1] NA

while if I just get a small number of elements from this vector, it works fine
>  <- yyy[1:10]
> 
 [1] "GG" "GG" "GG" "AG" "GG" "GG" "AA" "GG" "AG" "GG"
> table()

AA AG GG 
 1  2  7 
> sum(=="GG")
[1] 7

I checked the archives for help but couldn´t find my error
What am I missing?

> sessionInfo()
R version 2.8.1 (2008-12-22) 
i386-pc-mingw32 


attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

Thanks 

Dave


_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interpreting "not defined because of singularities" in lm

2009-03-30 Thread jiblerize22
I run lm to fit an OLS model where one of the covariates is a factor with 30 
levels. I use contr.treatment() to set the base level of the factor, so when I 
run lm() no coefficients are estimated for that level. But in addition (and 
regardless of which level I choose to be the base), lm also gives a vector of 
NA coefficients for another level of my factor.

The output says that these coefficients were "not defined because of 
singularities", suggesting maybe that the 28 estimated coefficients are 
sufficient to pin down the 29th... but why is this the case? Why am I going 
from 30 levels to 28 coefficients? Am I misunderstanding the way factors/levels 
are supposed to work?

Thanks for any suggestions.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of character vector

2009-03-30 Thread Usuario R
Hi,

This: sum(data[,1]=="GG") may not work because you have some NA in your
data. Try this:

sum( data[ , 1 ] == "GG", na.rm = TRUE )

Regards



2009/3/30 David A.G 

>
> Dear list,
>
> I am trying to evaluate how many elements in a vector equal a certain
> value. The vectors are the columns of a data.frame, read in using
> read.table():
>
> > dim(data)
> [1] 2600  742
> > data[1:5,1:5]
>  SNP001 SNP002 SNP003 SNP004 SNP005
> 1 GG AA TT TT GG
> 2 GG AA TC TT GG
> 3 GG AC CC TT GG
> 4 AG AA TT TT GG
> 5 GG AA TC TT GG
>
> > table(data[,1])
>
>  AA   AG   GG
>  251 1093 1252
>
> but if I do
>
> > sum(data[,1]=="GG")
> [1] NA
>
> I have tried storing the column it in a vector but with same results:
>
> > yyy<-(data[,1])
> > sum(yyy=="GG")
> [1] NA
>
> while if I just get a small number of elements from this vector, it works
> fine
> >  <- yyy[1:10]
> > 
>  [1] "GG" "GG" "GG" "AG" "GG" "GG" "AA" "GG" "AG" "GG"
> > table()
> 
> AA AG GG
>  1  2  7
> > sum(=="GG")
> [1] 7
>
> I checked the archives for help but couldn´t find my error
> What am I missing?
>
> > sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> Thanks
>
> Dave
>
>
> _
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning messages in Splancs package :: no non-missing arguments to min; returning Inf

2009-03-30 Thread D
Dear Barry,

I am new to R and I am sorry for the sens information. I am using R
2.8.1 for Windows.

Does the error come from the kernel2d function, or from the image function?
-- It comes from the kernel2d function.

Does it do that for any data points?
-- Not yet tried with other data, only two mentioned Shapefiles. They
are correctly structured.

Have you read the help for kernel2d? Have you tried the example in the
help(kernel2d) text?
-- I will look into it, thanks!

Have you ever had it work?
-- No, tries on two separate installations.

Thanks for the solution, I will test it right away.

Dejan

On Mon, Mar 30, 2009 at 09:43, Barry Rowlingson
 wrote:
> On Mon, Mar 30, 2009 at 7:12 AM, D  wrote:
>> Hi,
>>
>> I would need some help with the splans package in R.
>>
>> I am using a Shapefile (downloadable at)
>> http://rapidshare.com/files/215206891/Redlands_Crime.zip
>>
>> and the following execution code
>>
>>
>> setwd("C:\\Documents and
>> Settings\\Dejan\\Desktop\\GIS\\assignment6\\DataSet_Redlands_Crime\\Redlands_Crime")
>> library(foreign)
>> library(splancs)
>> auto_xy<-read.dbf("Auto_theft_98.dbf")
>> rob_xy<-read.dbf("Robbery_98.dbf")
>> auto.spp<-as.points(auto_xy$x/1000, auto_xy$y/1000)
>> rob.spp<-as.points(rob_xy$x/1000, rob_xy$y/1000)
>> image(kernel2d(auto.spp, bbox(auto.spp), h0=4, nx=100, ny=100),
>> col=terrain.colors(10))
>> pointmap(auto.spp, col="red", add=TRUE)
>>
>> I would need to analyze the relationship betweeb the two Shapefiles,
>> but I am receiving the following warning message and a blank output
>>
>>
>> Xrange is  1827.026 6796.202
>> Yrange is  1853.896 6832.343
>> Doing quartic kernel
>> Warning messages:
>> 1: In min(x) : no non-missing arguments to min; returning Inf
>> 2: In max(x) : no non-missing arguments to max; returning -Inf
>>
>>
>> Can someone help me with what am I doing wrong in the execution code?
>> I am getting a blank graph.
>
> Well, do some investigation. Does the error come from the kernel2d
> function, or from the image function? Does it do that for any data
> points? Have you read the help for kernel2d? Have you tried the
> example in the help(kernel2d) text? Have you ever had it work? What
> version of R are you using and so on. Please read the posting guide.
>
>  The manual for kernel2d says that the second argument has to be "A
> splancs polygon data set". But you've given it bbox(auto.spp). But
> bbox returns a matrix which is the wrong structure - the columns are
> min and max and the rows are X and Y. Plus it only has the corner
> points, not all four points of the box which splancs says it needs.
> There's also a clue in your output:
>
>  Xrange is  1827.026 6796.202
>  Yrange is  1853.896 6832.343
>
>  - but if you plot the points (which you should always do, to make
> sure you've read them in properly) you should see that the X range
> should be 6796 to 6832 and the Y range should be 1827 to 1853.
>
>  Solution: use the splancs bboxx() function that converts an sp-style
> bounding box to a splancs-style bounding box:
>
>  k = kernel2d(auto.spp, bboxx(bbox(auto.spp)), h0=4, nx=100, ny=100)
>  image(k)
>
>  Barry
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of character vector

2009-03-30 Thread David A.G

Thanks, that solved it!




Date: Mon, 30 Mar 2009 09:55:17 +0200
Subject: Re: [R] Sum of character vector
From: r.user.sp...@gmail.com
To: dasol...@hotmail.com
CC: r-help@r-project.org

Hi, 

This: sum(data[,1]=="GG") may not work because you have some NA in your data. 
Try this:

sum( data[ , 1 ] == "GG", na.rm = TRUE )



Regards



2009/3/30 David A.G 



Dear list,



I am trying to evaluate how many elements in a vector equal a certain value. 
The vectors are the columns of a data.frame, read in using read.table():



> dim(data)

[1] 2600 ¨¢742

> data[1:5,1:5]

 ¨¢SNP001 SNP002 SNP003 SNP004 SNP005

1 ¨¢ ¨¢ GG ¨¢ ¨¢ AA ¨¢ ¨¢ TT ¨¢ ¨¢ TT ¨¢ ¨¢ GG

2 ¨¢ ¨¢ GG ¨¢ ¨¢ AA ¨¢ ¨¢ TC ¨¢ ¨¢ TT ¨¢ ¨¢ GG

3 ¨¢ ¨¢ GG ¨¢ ¨¢ AC ¨¢ ¨¢ CC ¨¢ ¨¢ TT ¨¢ ¨¢ GG

4 ¨¢ ¨¢ AG ¨¢ ¨¢ AA ¨¢ ¨¢ TT ¨¢ ¨¢ TT ¨¢ ¨¢ GG

5 ¨¢ ¨¢ GG ¨¢ ¨¢ AA ¨¢ ¨¢ TC ¨¢ ¨¢ TT ¨¢ ¨¢ GG



> table(data[,1])



 ¨¢AA ¨¢ AG ¨¢ GG

¨¢251 1093 1252



but if I do



> sum(data[,1]=="GG")

[1] NA



I have tried storing the column it in a vector but with same results:



> yyy<-(data[,1])

> sum(yyy=="GG")

[1] NA



while if I just get a small number of elements from this vector, it works fine

>  <- yyy[1:10]

> 

¨¢[1] "GG" "GG" "GG" "AG" "GG" "GG" "AA" "GG" "AG" "GG"

> table()



AA AG GG

¨¢1 ¨¢2 ¨¢7

> sum(=="GG")

[1] 7



I checked the archives for help but couldn©Èt find my error

What am I missing?



> sessionInfo()

R version 2.8.1 (2008-12-22)

i386-pc-mingw32





attached base packages:

[1] stats ¨¢ ¨¢ graphics ¨¢grDevices utils ¨¢ ¨¢ datasets ¨¢methods ¨¢ base



Thanks



Dave





_





 ¨¢ ¨¢ ¨¢ ¨¢[[alternative HTML version deleted]]




__

R-help@r-project.org mailing list



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] forecasting issue

2009-03-30 Thread totallyunimodular

For what its worth, I am having the same issue. Specifically, I am using R
2.8.1 on Windows XP, applying auto.arima to the data from the 
http://www.neural-forecasting-competition.com/datasets.htm NN5 forecasting
competition , series NN-101 through NN-111. The relevant code is

 library(RODBC)
 channel <- odbcConnectExcel("NN5_FINAL_DATASET_WITH_TEST_DATA.xls")
 alldata <- sqlFetch(channel, "NN5 COMPLETE Data")
 odbcClose(channel)
 series <- alldata[17:751,102:112]
 actualWithdrawls <- alldata[752:807,102:112]
 fit <- auto.arima(series[,i], stationary=FALSE, ic="aic", max.p=12,
max.q=3, stepwise=TRUE)
 tmp = predict(fit, n.ahead=56)
 forecast = tmp$pred

As habby reported, every time the optimal model found includes drift, the
call to predict results in 

  Error in predict.Arima(fit, n.ahead = 56) : 
 'xreg' and 'newxreg' have different numbers of columns

I have found other threads on this same issue with no responses. I am a
fairly new R user, so maybe there is something basic I am doing
incorrectly...

I found some interesting, seemingly relevant discussion 
http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm here  but have yet to
digest it all. 

My basic problem is how to set up auto.arima to be as automated as possible.
I had written a for loop to crunch through all of the series in from the NN5
competition and experiment with different auto.arima settings and compare
out of sample forecast accuracy. But, having run into this issue, its
unclear what the cause is and if/how it can be avoided.

Thanks for any ideas. 
 

-- 
View this message in context: 
http://www.nabble.com/forecasting-issue-tp21585746p22775817.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to generate a new column according to some rule?

2009-03-30 Thread minben
In a data frame I have a column "date" and a column "time",now I want
to generate a new column which is the mean of the value of time group
by date. In stata the command is

egen scalls = mean(time),by(date)

but I don't know the command in R, can anybody help me?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I add a rug to a 3d 'persp' plot?

2009-03-30 Thread RJCallahan

Hi all,

I have a (hopefully quick) question. I've got a fascinating set of fitted
surfaces in three dimensions corresponding to local linear multiple
regressions. I'd like to add rugs to the X and Y axes (corresponding to my
independent variables) in order to get a sense for how many data points I'm
working with to graph various portions of the surfaces. The trouble is, I
can't just plot the rugs with rug() because it's a 3-d plot. Similarly, I
can't just pass rug() into trans3d() because trans3d() requires a set of
three coordinates, right? And rug() internally calls on Axis(), which
doesn't correspond to anything that references points.

I'm thinking I'll have to draw the rugs manually by figuring out what set of
lines() commands would do the same thing to a plot as rug(), and then
passing those lines() commands as parameters into trans3d() as per the
example at the bottom of ?persp. I thought I'd ask, first, in case someone
else has this problem: is there an easy way to add rugs to a 3-d plot
generated using 'persp'? Thank you very much!

Sincerely, 

Richard Callahan
-- 
View this message in context: 
http://www.nabble.com/How-do-I-add-a-rug-to-a-3d-%27persp%27-plot--tp22778353p22778353.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding matrices with common column names

2009-03-30 Thread Murali.MENON
Benjamin, Dimitris,
Thanks very much. Neat work!
Murali

-Original Message-
From: Nutter, Benjamin [mailto:nutt...@ccf.org] 
Sent: 27 March 2009 13:52
To: MENON Murali; r-help@r-project.org
Subject: RE: [R] adding matrices with common column names

Shucks, Dimitris beat me to it.  And his code is a bit more elegant than
mine.  But since I did the work I may as well post it, right?

This version incorporates a couple of error checks to make sure all your
arguments are matrices with the same number of rows.

add.by.name <- function(...){
  args <- list(...)
  
  mat.test <- sapply(args,is.matrix)
  if(FALSE %in% mat.test) stop("All arguments must be matrices")

  mat.row <- unique(sapply(args,nrow))
  if(length(mat.row)>1) stop("All matrices must have the same number of
rows")
  
  all.names <- unique(as.vector(sapply(args,colnames)))
  
  sum.mat <- matrix(0,nrow=mat.row,ncol=length(all.names))
  colnames(sum.mat) <- all.names

  for(i in 1:length(args)){
tmp <- args[[i]]
sum.mat[,colnames(tmp)] <- sum.mat[,colnames(tmp)] + tmp
  }

  return(sum.mat)
}

m1 <- matrix(1:20,ncol=4); colnames(m1) <- c("a","b","c","d")
m2 <- matrix(1:20,ncol=4); colnames(m2) <- c("b","c","d","e")
m3 <- matrix(1:20,ncol=4); colnames(m3) <- c("a","b","d","e")

add.by.name(m1,m2,m3)



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of murali.me...@fortisinvestments.com
Sent: Friday, March 27, 2009 9:25 AM
To: r-help@r-project.org
Subject: [R] adding matrices with common column names

folks,
 
if i have three matrices, a, b, cc with some colnames in common, and i
want to create a matrix which consists of the common columns added up,
and the other columns tacked on, what's a good way to do it? i've got
the following roundabout code for two matrices, but if the number of
matrices increases, then i'm a bit stymied.
 
> a <- matrix(1:20,ncol=4); colnames(a) <- c("a","b","c","d") b <- 
> matrix(1:20,ncol=4); colnames(b) <- c("b","c","d", "e")
> cbind(a[,!(colnames(a) %in% colnames(b)), drop = FALSE],
a[,intersect(colnames(a),colnames(b))] +
b[,intersect(colnames(a),colnames(b)), drop = FALSE],
b[,!(colnames(b) %in% colnames(a)), drop = FALSE])
 
 a  b  c  d  e
[1,] 1  7 17 27 16
[2,] 2  9 19 29 17
[3,] 3 11 21 31 18
[4,] 4 13 23 33 19
[5,] 5 15 25 35 20
 
now, what if i had a matrix cc? i want to perform the above operation on
all three matrices a, b, cc.
 
> cc <- matrix(1:10,ncol=2); colnames(cc) <- c("e","f")

i need to end up with:

 a  b  c  d  e  f
[1,] 1  7 17 27 17  6
[2,] 2  9 19 29 19  7
[3,] 3 11 21 31 21  8
[4,] 4 13 23 33 23  9
[5,] 5 15 25 35 25 10

and, in general, with multiple matrices with intersecting colnames?

thanks,

murali

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


===

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals in America by U.S.
News & World Report (2008).  
Visit us online at http://www.clevelandclinic.org for a complete listing
of our services, staff and locations.


Confidentiality Note:  This message is intended for use ...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning messages in Splancs package :: no non-missing arguments to min; returning Inf

2009-03-30 Thread Barry Rowlingson
On Mon, Mar 30, 2009 at 7:12 AM, D  wrote:
> Hi,
>
> I would need some help with the splans package in R.
>
> I am using a Shapefile (downloadable at)
> http://rapidshare.com/files/215206891/Redlands_Crime.zip
>
> and the following execution code
>
>
> setwd("C:\\Documents and
> Settings\\Dejan\\Desktop\\GIS\\assignment6\\DataSet_Redlands_Crime\\Redlands_Crime")
> library(foreign)
> library(splancs)
> auto_xy<-read.dbf("Auto_theft_98.dbf")
> rob_xy<-read.dbf("Robbery_98.dbf")
> auto.spp<-as.points(auto_xy$x/1000, auto_xy$y/1000)
> rob.spp<-as.points(rob_xy$x/1000, rob_xy$y/1000)
> image(kernel2d(auto.spp, bbox(auto.spp), h0=4, nx=100, ny=100),
> col=terrain.colors(10))
> pointmap(auto.spp, col="red", add=TRUE)
>
> I would need to analyze the relationship betweeb the two Shapefiles,
> but I am receiving the following warning message and a blank output
>
>
> Xrange is  1827.026 6796.202
> Yrange is  1853.896 6832.343
> Doing quartic kernel
> Warning messages:
> 1: In min(x) : no non-missing arguments to min; returning Inf
> 2: In max(x) : no non-missing arguments to max; returning -Inf
>
>
> Can someone help me with what am I doing wrong in the execution code?
> I am getting a blank graph.

Well, do some investigation. Does the error come from the kernel2d
function, or from the image function? Does it do that for any data
points? Have you read the help for kernel2d? Have you tried the
example in the help(kernel2d) text? Have you ever had it work? What
version of R are you using and so on. Please read the posting guide.

 The manual for kernel2d says that the second argument has to be "A
splancs polygon data set". But you've given it bbox(auto.spp). But
bbox returns a matrix which is the wrong structure - the columns are
min and max and the rows are X and Y. Plus it only has the corner
points, not all four points of the box which splancs says it needs.
There's also a clue in your output:

 Xrange is  1827.026 6796.202
 Yrange is  1853.896 6832.343

 - but if you plot the points (which you should always do, to make
sure you've read them in properly) you should see that the X range
should be 6796 to 6832 and the Y range should be 1827 to 1853.

 Solution: use the splancs bboxx() function that converts an sp-style
bounding box to a splancs-style bounding box:

 k = kernel2d(auto.spp, bboxx(bbox(auto.spp)), h0=4, nx=100, ny=100)
 image(k)

 Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [cluster package question] What is the "sum of the dissimilarities" in the pam command ?

2009-03-30 Thread Martin Maechler
> "TG" == Tal Galili 
> on Sun, 29 Mar 2009 03:09:17 +0300 writes:

TG> Hello Martin Maechler and All,
TG> A simple question (I hope):
TG> How can I compute the "sum of the dissimilarities" that appears in the 
pam
TG> command (from the cluster package) ?


TG> Is it the "manhattan" distance (such as the one implemented by "dist") ?


well, it first depends if  'x'  in  pam(x, k, dist, metric, ...)
is *itself* a dissimilarity object or not.
-->  help(daisy)  and  help(dist)

If it is *not*  --- which I assume from your question ---
then the answer depends on the 'metric' argument of pam().

As you did not mention that, I assume  you left 'metric' at its
default which is "euclidean", i.e.,
not "manhattan".



TG> I am asking since I am running clustering on a dataset. I found 7 
medoids
TG> with the pam command, and from it I have the medoid to which each
TG> observation belongs to. But when I check it, I find only (about) 90% of
TG> observations has the minimum manhattan distance to the medoids that pam
TG> predicted.

TG> If this is the manhattan distance that is used, I will create some toy 
data
TG> to see if I can reproduce this.

Yes, specifying some reproducible toy data and specific R code
is almost always useful and typically more productive when
asking such questions by e-mail.

Regards,
Martin Maechler, ETH Zurich

TG> Thanks,
TG> Tal

TG> --


TG> My contact information:
TG> Tal Galili
TG> Phone number: 972-50-3373767
TG> FaceBook: Tal Galili
TG> My Blogs:
TG> http://www.r-statistics.com/
TG> http://www.talgalili.com
TG> http://www.biostatistics.co.il

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Retrieving the context

2009-03-30 Thread Fredrik Karlsson
Dear list,

I have a general problem that I really don't know how to solve efficiently
in R. Lets say we have a sequence of things, like for instance a string of
words, that is stored in a file. We need all the words in a table format, so
therefore we create an id for the word, that links the word to a file and
the position of the word within the file, like:

#In this case a very short file
> strsplit("This is a text string, wich is stored in the file myfile","
")[[1]] -> mystring
#Now, store in a data.frame
> mydf <- data.frame(strings=mystring,
word_id=paste("myfile",1:length(mystring),sep="_"))
> mydf
   strings   word_id
1 This  myfile_1
2   is  myfile_2
3a  myfile_3
4 text  myfile_4
5  string,  myfile_5
6 wich  myfile_6
7   is  myfile_7
8   stored  myfile_8
9   in  myfile_9
10 the myfile_10
11file myfile_11
12  myfile myfile_12

Now, I would like to see all the words 'is' in a user defined window: so
see_context("is",mydf,1) would give
This is a
wich is stored

and see_context("is",mydf,2) would show two words before and after.. and so
on.

Any ideas on how to solve this kind of problem in R?

/Fredrik


-- 
"Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it."

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix max by row

2009-03-30 Thread Wacek Kusnierczyk
Rolf Turner wrote:
> I tried the following:
>
> m <- matrix(runif(10),1000,100)
> junk <- gc()
> print(system.time(for(i in 1:100) X1 <- do.call(pmax,data.frame(m
> junk <- gc()
> print(system.time(for(i in 1:100) X2 <- apply(m,1,max)))
>
> and got
>
>user  system elapsed
>   2.704   0.110   2.819
>user  system elapsed
>   1.938   0.098   2.040
>
> so unless there's something that I am misunderstanding (always a serious
> consideration) Wacek's apply method looks to be about 1.4 times
> *faster* than
> the do.call/pmax method.


hmm, since i was called by name (i'm grateful, rolf), i feel obliged to
check the matters myself:

# dummy data, presumably a 'large matrix'?
n = 5e3
m = matrix(rnorm(n^2), n, n)

# what is to be benchmarked...
waku = expression(matrix(apply(m, 1, max), nrow(m)))
bert = expression(do.call(pmax,data.frame(m)))

# to be benchmarked
library(rbenchmark)
benchmark(replications=10, order='elapsed', columns=c('test',
'elapsed'),
   waku=matrix(apply(m, 1, max), nrow(m)),
   bert=do.call(pmax,data.frame(m)))

takes quite a while, but here you go:

#   test elapsed
# 1 waku  11.838
# 2 bert  20.833

where bert's solution seems to require a wonder to 'be considerably
faster for large matrices'.  to have it fair, i also did

# to be benchmarked
library(rbenchmark)
benchmark(replications=10, order='elapsed', columns=c('test',
'elapsed'),
   bert=do.call(pmax,data.frame(m)),
   waku=matrix(apply(m, 1, max), nrow(m)))

#  test elapsed
# 2 waku  11.695
# 1 bert  20.912
   
take home point: a good product sells itself, a bad product may not sell
despite aggressive marketing.

rolf, thanks for pointing this out.

cheers,
vQ


> cheers,
>
> Rolf Turner
>
>
> On 30/03/2009, at 3:55 PM, Bert Gunter wrote:
>
>> If speed is a consideration,availing yourself of the built-in pmax()
>> function via
>>
>> do.call(pmax,data.frame(yourMatrix))
>>
>> will be considerably faster for large matrices.
>>
>> If you are puzzled by why this works, it is a useful exercise in R to
>> figure
>> it out.
>>
>> Hint:The man page for ?data.frame says:
>> "A data frame is a list of variables of the same length with unique row
>> names, given class 'data.frame'."
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>> Genentech Nonclinical Statistics
>>
>> -Original Message-
>> From: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org] On
>> Behalf Of Wacek Kusnierczyk
>> Sent: Saturday, March 28, 2009 5:22 PM
>> To: Ana M Aparicio Carrasco
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Matrix max by row
>>
>> Ana M Aparicio Carrasco wrote:
>>> I need help about how to obtain the max by row in a matrix.
>>> For example if I have the following matrix:
>>> 2 5 3
>>> 8 7 2
>>> 1 8 4
>>>
>>> The max by row will be:
>>> 5
>>> 8
>>> 8
>>>
>>
>> matrix(apply(m, 1, max), nrow(m))
>>
>> vQ
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ##
> Attention:This e-mail message is privileged and confidential. If you
> are not theintended recipient please delete the message and notify the
> sender.Any views or opinions presented are solely those of the author.
>
> This e-mail has been scanned and cleared by
> MailMarshalwww.marshalsoftware.com
> ##

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "[.data.frame" and lapply

2009-03-30 Thread Wacek Kusnierczyk

> Bert Gunter wrote:
>   
>> "Note that these operations do not match their index arguments in the
>> standard way: argument names are ignored and positional matching only is
>> used. So m[j=2,i=1] is equivalent to m[2,1] and not to m[1,2]. "
>>
>> ## Note that the next lines immediately following say:
>>
>> "This may not be true for methods defined for them; for example it is not
>> true for the data.frame methods described in [.data.frame. 
>>
>> To avoid confusion, do not name index arguments (but drop and exact must be
>> named). "
>>
>> So, while it may be fair to characterize the md[,i=3] as a design flaw, it
>> is both explicitly pointed out and warned against. Note that,of course
>>
>> md[,3]
>> ## 3rd column, good practice
>> md[,j=3]
>> ## also 3rd column .. but warned against as bad practice
>>
>> Whether a behavior should be considered a "bug" if it is explicitly warned
>> against in the docs, I leave for others to decide. Too deep for me. 
>> 

in my humble opinion, if the above (and the previously discussed) is
neither a bug nor a design flaw, then it must be an intentional
misfeature designed specifically to confuse users.  have your take.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpreting "not defined because of singularities" in lm

2009-03-30 Thread Duncan Murdoch

jibleriz...@yahoo.com wrote:

I run lm to fit an OLS model where one of the covariates is a factor with 30 
levels. I use contr.treatment() to set the base level of the factor, so when I 
run lm() no coefficients are estimated for that level. But in addition (and 
regardless of which level I choose to be the base), lm also gives a vector of 
NA coefficients for another level of my factor.

The output says that these coefficients were "not defined because of 
singularities", suggesting maybe that the 28 estimated coefficients are sufficient 
to pin down the 29th... but why is this the case? Why am I going from 30 levels to 28 
coefficients? Am I misunderstanding the way factors/levels are supposed to work?
The usual cause of this is that one of the levels is not present in the 
data set.  Another possibility is collinearity with some other covariate 
in your model.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrined dependent optimization.

2009-03-30 Thread Paul Smith
On Sun, Mar 29, 2009 at 9:45 PM,   wrote:
> I have an optimization question that I was hoping to get some suggestions on 
> how best to go about sovling it. I would think there is probably a package 
> that addresses this problem.
>
> This is an ordering optimzation problem. Best to describe it with a simple 
> example. Say I have 100 "bins" each with a ball in it numbered from 1 to 100. 
> Each bin can only hold one ball. This optimization is that I have a function 
> 'f' that this array of bins and returns a number. The number returned from 
> f(1,2,3,4) would return a different number from that of f(2,1,3,4). 
> The optimization is finding the optimum order of these balls so as to produce 
> a minimum value from 'f'.I cannot use the regular 'optim' algorithms because 
> a) the values are discrete, and b) the values are dependent ie. when the 
> "variable" representing the bin location is changed (in this example a new 
> ball is put there) the existing ball will need to be moved to another bin 
> (probably swapping positions), and c) each "variable" is constrained, in the 
> example above the only allowable values are integers from 1-100. So the 
> problem becomes finding the optimum order of the "balls".
>
> Any suggestions?

If your function f is linear, then you can use lpSolve.

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to input multiple .txt files

2009-03-30 Thread Mike Lawrence
oops, didn't read the question fully. If you want to create 2 master files:

cust1_files = list.files(path=path_to_my_files,pattern='cust1',full.names=TRUE)
a=NULL
for(this_file in cust1_files){
   a=rbind(a,read.table(this_file))
}
write.table(a,'cust1.master.txt')

cust2_files = list.files(path=path_to_my_files,pattern='cust2',full.names=TRUE)
a=NULL
for(this_file in cust2_files){
   a=rbind(a,read.table(this_file))
}
write.table(a,'cust2.master.txt')


On Mon, Mar 30, 2009 at 8:55 AM, Mike Lawrence  wrote:
> my_files = list.files(path=path_to_my_files,pattern='.txt',full.names=TRUE)
>
> a=NULL
> for(this_file in my_files){
>        a=rbind(a,read.table(this_file))
> }
> write.table(a,my_new_file_name)
>
>
>
>
> On Sun, Mar 29, 2009 at 10:37 PM, Qianfeng Li  wrote:
>>
>>
>> how to input multiple .txt files?
>>
>> A data folder has lots of .txt files from different customers.
>>
>> Want to read all these .txt files to different master files:
>>
>> such as:
>>
>>  cust1.xx.txt,  cust1.xxx.txt, cust1..txt,.. to master file: 
>> X.txt
>>
>>  cust2.xx.txt,  cust2.xxx.txt, cust2..txt,.. to master file: 
>> Y.txt
>>
>>
>> Thanks!
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Mike Lawrence
> Graduate Student
> Department of Psychology
> Dalhousie University
>
> Looking to arrange a meeting? Check my public calendar:
> http://tinyurl.com/mikes-public-calendar
>
> ~ Certainty is folly... I think. ~
>



-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tinyurl.com/mikes-public-calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrined dependent optimization.

2009-03-30 Thread rkevinburton
It would in the stictess sense be non-linear since it is only defined for 
descrete interface values for each variable. And in general it would be 
non-linear anyway. If I only have three variables which can take on values 
1,2,3 then f(1,2,3) could equal 0 and f(2,1,3) could equal 10.

Thank you for the suggestions.

Kevin

 Paul Smith  wrote: 
> On Sun, Mar 29, 2009 at 9:45 PM,   wrote:
> > I have an optimization question that I was hoping to get some suggestions 
> > on how best to go about sovling it. I would think there is probably a 
> > package that addresses this problem.
> >
> > This is an ordering optimzation problem. Best to describe it with a simple 
> > example. Say I have 100 "bins" each with a ball in it numbered from 1 to 
> > 100. Each bin can only hold one ball. This optimization is that I have a 
> > function 'f' that this array of bins and returns a number. The number 
> > returned from f(1,2,3,4) would return a different number from that of 
> > f(2,1,3,4). The optimization is finding the optimum order of these 
> > balls so as to produce a minimum value from 'f'.I cannot use the regular 
> > 'optim' algorithms because a) the values are discrete, and b) the values 
> > are dependent ie. when the "variable" representing the bin location is 
> > changed (in this example a new ball is put there) the existing ball will 
> > need to be moved to another bin (probably swapping positions), and c) each 
> > "variable" is constrained, in the example above the only allowable values 
> > are integers from 1-100. So the problem becomes finding the optimum order 
> > of the "balls".
> >
> > Any suggestions?
> 
> If your function f is linear, then you can use lpSolve.
> 
> Paul
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: How to generate a new column according to some rule?

2009-03-30 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 30.03.2009 05:17:35:

> In a data frame I have a column "date" and a column "time",now I want
> to generate a new column which is the mean of the value of time group
> by date. In stata the command is
> 
> egen scalls = mean(time),by(date)
> 
> but I don't know the command in R, can anybody help me?

In R you can use 

?tapply
?by
?aggregate

from base or functions from
?doBy package.

Regards
Petr



> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column name assignment problem

2009-03-30 Thread Steve Murray

Jim and all,

Thanks - I managed to get it working based on your helpful advice.

I'm now trying to do something very similar which simply involves changing the 
names of the variables in column 1 to make them more succinct. I'm trying to do 
this via the 'levels' command as I figured that I might be able to apply the 
character strings in a similar way to how you recommended when dealing with 
'colnames'.


# Refine names of rivers to make more succinct
  riv_names <- get(paste("arunoff_",table_year, sep=''))[,1]
  levels(riv_names) <- c("AMAZON", "AMUR", "CONGO", "LENA", 
"MISSISSIPPI", "NIGER", "NILE", "OB", "PARANA", "YANGTZE", "YENISEI", "ZAMBEZI")
  assign(get(paste("arunoff_",table_year, sep='')[,1], 
levels(riv_names)))

Error in paste("arunoff_", table_year, sep = "")[, 1] : 
  incorrect number of dimensions

My thinking was to assign the levels of riv_names to column 1 of the table...

Many thanks again for any advice offered,

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to input multiple .txt files

2009-03-30 Thread Mike Lawrence
my_files = list.files(path=path_to_my_files,pattern='.txt',full.names=TRUE)

a=NULL
for(this_file in my_files){
a=rbind(a,read.table(this_file))
}
write.table(a,my_new_file_name)




On Sun, Mar 29, 2009 at 10:37 PM, Qianfeng Li  wrote:
>
>
> how to input multiple .txt files?
>
> A data folder has lots of .txt files from different customers.
>
> Want to read all these .txt files to different master files:
>
> such as:
>
>  cust1.xx.txt,  cust1.xxx.txt, cust1..txt,.. to master file: 
> X.txt
>
>  cust2.xx.txt,  cust2.xxx.txt, cust2..txt,.. to master file: 
> Y.txt
>
>
> Thanks!
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tinyurl.com/mikes-public-calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column name assignment problem

2009-03-30 Thread Steve Murray

Dear all,

Apologies for yet another question (!). Hopefully it won't be too tricky to 
solve. I am attempting to add row and column names (these are in fact numbers) 
to each of the tables created by the code (120 in total).


# Create index of file names
files <- print(ls()[1:120], quote=FALSE)  # This is the best way I could manage 
to successfully attribute all the table names to a single list - I realise it's 
horrible coding (especially as it relies on the first 120 objects stored in the 
memory actually being the objects I want to use)...

files
  [1] "Fekete_198601" "Fekete_198602" "Fekete_198603" "Fekete_198604"
  [5] "Fekete_198605" "Fekete_198606" "Fekete_198607" "Fekete_198608"
  [9] "Fekete_198609" "Fekete_198610" "Fekete_198611" "Fekete_198612"
  [13] "Fekete_198701" "Fekete_198702" "Fekete_198703" "Fekete_198704"
  [17] "Fekete_198705" "Fekete_198706" "Fekete_198707" "Fekete_198708" 
...[truncated - there are 120 in total]


# Provide column and row names according to lat/longs.

rnames <- sprintf("%.2f", seq(from = -89.75, to = 89.75, length = 360))
columnnames <- sprintf("%.2f", seq(from = -179.75, to = 179.75, length = 720))

for (i in files) {
assign(colnames((paste(Fekete_",index$year[i], index$month[i])", 
sep='')), columnnames)
assign(rownames(paste("rownames(Fekete_",index$year[i], 
index$month[i],")", sep=''), rnames))
}


Error: unexpected string constant in:
"for (i in files) {
assign(colnames((paste(Fekete_",index$year[i], index$month[i])""
> assign(rownames(paste("rownames(Fekete_",index$year[i], 
> index$month[i],")", sep=''), rnames))
Error in if (do.NULL) NULL else if (nr> 0) paste(prefix, seq_len(nr),  : 
  argument is not interpretable as logical
In addition: Warning message:
In if (do.NULL) NULL else if (nr> 0) paste(prefix, seq_len(nr),  :
  the condition has length> 1 and only the first element will be used
> }
Error: unexpected '}' in "}"



Is there a more elegant way of creating a list of file names in this case 
(remember that there are 2 variable parts to each name) which would facilitate 
the assigning of column and row names to each table? (And make life easier when 
doing other things with the data, e.g. plotting...!).

Many thanks once again - the help offered really is appreciated.

Steve


_
All your Twitter and other social updates in one place 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Burt table from word frequency list

2009-03-30 Thread Alan Zaslavsky
Maybe not terribly hard, depending on exactly what you need.  Suppose you 
turn your text into a character vector 'mytext' of words.  Then for a 
table of words appearing delta words apart (ordered), you can table mytext 
against itself with a lag:


nwords=length(mytext)
burttab=table(mytext[-(1:delta)],mytext[nwords+1-(1:delta)])

Add to its transpose and sum over delta up to your maximum distance apart. 
If you want only words appearing near each other within the same sentence 
(or some other unit), pad out the sentence break with at least delta 
instances of a dummy spacer:


the cat chased the greedy rat SPACER SPACER SPACER the dog chased the
clever cat

This will count all pairings at distance delta; if you want to count only 
those for which this was the NEAREST co-occurence (so


the cat and the rate chased the dog

would count as two at delta=3 but not one at delta=6) it will be trickier 
and I'm not sure this approach can be modified to handle it.



Date: Sun, 29 Mar 2009 22:20:15 -0400
From: "Murray Cooper" 
Subject: Re: [R] Burt table from word frequency list

The usual approach is to count the co-occurence within so many words of
each other.  Typical is between 5 words before and 5 words after a
given word.  So for each word in the document, you look for the
occurence of all other words within -5 -4 -3 -2 -1 0 1 2 3 4 5 words.
Depending on the language and the question being asked certain words
may be excluded.

This is not a simple function! I don't know if anyone has done a
package, for this type of analysis but with over 2000 packages floating
around you might get lucky.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to input multiple .txt files

2009-03-30 Thread baptiste auguie

may i suggest the following,


a <- do.call(rbind, lapply(cust1_files, read.table))

(i believe expanding objects in a for loop belong to the R Inferno)

baptiste

On 30 Mar 2009, at 12:58, Mike Lawrence wrote:



cust1_files =  
list.files(path=path_to_my_files,pattern='cust1',full.names=TRUE)

a=NULL
for(this_file in cust1_files){
  a=rbind(a,read.table(this_file))
}
write.table(a,'cust1.master.txt')


_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] which rows are duplicates?

2009-03-30 Thread Wacek Kusnierczyk
Michael Dewey wrote:
> At 05:07 30/03/2009, Aaron M. Swoboda wrote:
>> I would like to know which rows are duplicates of each other, not
>> simply that a row is duplicate of another row. In the following
>> example rows 1 and 3 are duplicates.
>>
>> > x <- c(1,3,1)
>> > y <- c(2,4,2)
>> > z <- c(3,4,3)
>> > data <- data.frame(x,y,z)
>> x y z
>> 1 1 2 3
>> 2 3 4 4
>> 3 1 2 3
>

i don't have any solution significantly better than what you have
already been given.  but i have a warning instead.

in the below, you use both 'duplicated' and 'unique' on data frames, and
the proposed solution relies on the latter.  you may want to try to
avoid both when working with data frames;  this is because of how they
do (or don't) work.

duplicated (and unique, which calls duplicated) simply pastes the
content of each row into a *string*, and then works on the strings. 
this means that NAs in the data frame are converted to "NA"s, and "NA"
== "NA", obviously, so that rows that include NAs and are otherwise
identical will be considered *identical*.

that's not bad (yet), but you should be aware.  however, duplicated has
a parameter named 'incomparables', explained in ?duplicated as follows:

"
incomparables: a vector of values that cannot be compared. 'FALSE' is a
  special value, meaning that all values can be compared, and
  may be the only value accepted for methods other than the
  default.  It will be coerced internally to the same type as
  'x'.
"

and also

"
 Values in 'incomparables' will never be marked as duplicated. This
 is intended to be used for a fairly small set of values and will
 not be efficient for a very large set.
"

that is, for example:

vector = c(NA, NA)
duplicated(vector)
# [1] FALSE TRUE
duplicated(vector), incomparables=NA)
# [1] FALSE FALSE

list = list(NA, NA)
duplicated(list)
# [1] FALSE TRUE
duplicated(list, incomparables=NA)
# [1] FALSE FALSE


what the documentation *fails* to tell you is that the parameter
'incomparables' is defunct in duplicated.data.frame, which you can see
in its source code (below), or in the following example:

# data as above, or any data frame
duplicated(data, incomparables=NA)
# Error in if (!is.logical(incomparables) || incomparables)
.NotYetUsed("incomparables != FALSE") :
#   missing value where TRUE/FALSE needed

the error message here is *confusing*.  the error is raised because the
author of the code made a mistake and apparently haven't carefully
examined and tested his product;  the code goes:

duplicated.data.frame
# function (x, incomparables = FALSE, fromLast = FALSE, ...)
# {
#if (!is.logical(incomparables) || incomparables)
#.NotYetUsed("incomparables != FALSE")
#duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast)
# }
# 

clearly, the intention here is to raise an error with a (still hardly
clear) message as in:

.NotYetUsed("incomparables != FALSE")
# Error: argument 'incomparables != FALSE' is not used (yet)

but instead, if(NA) is evaluated (because '!is.logical(NA) || NA'
evaluates, *obviously*, to NA) and hence the uninformative error message.

take home point:  rtfm, *but* don't believe it.

vQ


> Does this do what you want?
> > x <- c(1,3,1)
> > y <- c(2,4,2)
> > z <- c(3,4,3)
> > data <- data.frame(x,y,z)
> > data.u <- unique(data)
> > data.u
>   x y z
> 1 1 2 3
> 2 3 4 4
> > data.u <- cbind(data.u, set = 1:nrow(data.u))
> > merge(data, data.u)
>   x y z set
> 1 1 2 3   1
> 2 1 2 3   1
> 3 3 4 4   2
>
> You need to do a bit more work to get them back into the original row
> order if that is essential.
>
>
>
>> I can't figure out how to get R to tell me that observation 1 and 3
>> are the same.  It seems like the "duplicated" and "unique" functions
>> should be able to help me out, but I am stumped.
>>
>> For instance, if I use "duplicated" ...
>>
>> > duplicated(data)
>> [1] FALSE FALSE TRUE
>>
>> it tells me that row 3 is a duplicate, but not which row it matches.
>> How do I figure out WHICH row it matches?
>>
>> And If I use "unique"...
>>
>> > unique(data)
>> x y z
>> 1 1 2 3
>> 2 3 4 4
>>
>> I see that rows 1 and 2 are unique, leaving me to infer that row 3 was
>> a duplicate, but again it doesn't tell me which row it was a duplicate
>> of (as far as I can tell). Am I missing something?
>>
>> How can I determine that row 3 is a duplicate OF ROW 1?
>>
>> Thanks,
>>
>> Aaron
>>
>>
>
> Michael Dewey
> http://www.aghmed.fsnet.co.uk
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875

Re: [R] Constrined dependent optimization.

2009-03-30 Thread Paul Smith
I do not really understand your argument regarding the non-linearity
of f. Perhaps, it would help us a lot if you defined concretely your
objective function or gave us a minimal example fully detailed and
defined.

Paul


On Mon, Mar 30, 2009 at 1:16 PM,   wrote:
> It would in the stictess sense be non-linear since it is only defined for 
> descrete interface values for each variable. And in general it would be 
> non-linear anyway. If I only have three variables which can take on values 
> 1,2,3 then f(1,2,3) could equal 0 and f(2,1,3) could equal 10.
>
> Thank you for the suggestions.
>
> Kevin
>
>  Paul Smith  wrote:
>> On Sun, Mar 29, 2009 at 9:45 PM,   wrote:
>> > I have an optimization question that I was hoping to get some suggestions 
>> > on how best to go about sovling it. I would think there is probably a 
>> > package that addresses this problem.
>> >
>> > This is an ordering optimzation problem. Best to describe it with a simple 
>> > example. Say I have 100 "bins" each with a ball in it numbered from 1 to 
>> > 100. Each bin can only hold one ball. This optimization is that I have a 
>> > function 'f' that this array of bins and returns a number. The number 
>> > returned from f(1,2,3,4) would return a different number from that of 
>> > f(2,1,3,4). The optimization is finding the optimum order of these 
>> > balls so as to produce a minimum value from 'f'.I cannot use the regular 
>> > 'optim' algorithms because a) the values are discrete, and b) the values 
>> > are dependent ie. when the "variable" representing the bin location is 
>> > changed (in this example a new ball is put there) the existing ball will 
>> > need to be moved to another bin (probably swapping positions), and c) each 
>> > "variable" is constrained, in the example above the only allowable values 
>> > are integers from 1-100. So the problem becomes finding the optimum order 
>> > of the "balls".
>> >
>> > Any suggestions?
>>
>> If your function f is linear, then you can use lpSolve.
>>
>> Paul
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] what is R equivalent of Fortran DOUBLE PRECISION ?

2009-03-30 Thread mauede
I noticed taht R cannot understand certain Fortran real constant formats. For 
instance:

 c14<- as.double( 7.785205408500864D-02)
Error: unexpected symbol in " c14<- as.double( 7.785205408500864D"

The above "D" is used in Fortran language to indicate the memory starage mode. 
That is for instructing Fortran compiler 
to store such a REAL constant in  DOUBLE PRECISION... am I right ?
Since R cannot undestand numerical conatant post-fixed by the letter "D", I 
wonder how I can instruct R interpreter to 
store such a numerical constant reserving as muh memory as necessary so as to 
accommodate a double precision number.

I noticed R accepts the folllowing syntax but I do not know if i have achieved 
my goal thsi way:

>  c14<- as.double( 7.785205408500864E-02)
> typeof(c4)
[1] "double"

My questions are: what is the best precision I can get with R when dealing with 
real number ?
Is R "double" type equvalent to Fortran DOUBLE PRECISION  for internal number 
representation ?

Thank you very much.
Maura








tutti i telefonini TIM!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2009-03-30 Thread ankhee dutta
Hi, All
 I have a linux system of Mandriva-2007 with R version 2.3.0 and MySQL with
5.0.0. I have also got DBI-R database interface version-0.1-11 installed on
my Linux system.While installing RMySQL package version 0.5-11 but facing
the problem mentioned below .



* Installing *source* package 'RMySQL' ...
creating cache ./config.cache
checking how to run the C preprocessor... cc -E
checking for compress in -lz... yes
checking for getopt_long in -lc... yes
checking for mysql_init in -lmysqlclient... no
checking for mysql.h... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for /usr/local/include/mysql/mysql.h... no
checking for /usr/include/mysql/mysql.h... no
checking for /usr/local/mysql/include/
mysql/mysql.h... no
checking for /opt/include/mysql/mysql.h... no
checking for /include/mysql/mysql.h... no

Configuration error:
  could not find the MySQL installation include and/or library
  directories.  Manually specify the location of the MySQL
  libraries and the header files and re-run R CMD INSTALL.

INSTRUCTIONS:

1. Define and export the 2 shell variables PKG_CPPFLAGS and
   PKG_LIBS to include the directory for header files (*.h)
   and libraries, for example (using Bourne shell syntax):

  export PKG_CPPFLAGS="-I"
  export PKG_LIBS="-L -lmysqlclient"

   Re-run the R INSTALL command:

  R CMD INSTALL RMySQL_.tar.gz

2. Alternatively, you may pass the configure arguments
  --with-mysql-dir= (distribution directory)
   or
  --with-mysql-inc= (where MySQL header files reside)
  --with-mysql-lib= (where MySQL libraries reside)
   in the call to R INSTALL --configure-args='...'

   R CMD INSTALL --configure-args='--with-mysql-dir=DIR'
RMySQL_.tar.gz

ERROR: configuration failed for package 'RMySQL'
** Removing '/usr/lib/R/library/RMySQL'





Any help will be great.
Thankyou in advance.

-- 
Ankhee Dutta
project trainee,
JNU,New Delhi-67

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Self Organizing Map

2009-03-30 Thread glaporta

Dear list,
I really appreciate previous suggestion about self organizing map. I tried
to perform SOM analyses with kohonen, som and class packages, but it's not
clear to me if these packages are complete to: 1) cluster neurons according
to their similarities (U-matrix); 2) assign to SOM neurons variables names;
3) define the importance of each variable, like in figures 2,3 and 5 of
http://dx.doi.org/10.1016/j.ecolmodel.2005.10.044 for example.

Thanks so much! 
Sincerely, Gianandrea
-- 
View this message in context: 
http://www.nabble.com/Self-Organizing-Map-tp22778862p22778862.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Wrong path to user defined library for the R Help Files

2009-03-30 Thread Breitbach, Nils
Dear R-Community,

since I work on a PC at the University I have not the necessary rights for all 
devices and therefore my library is located on a net device. The installation 
process worked and everything is right apart from one little thing - the help 
files. When I try to search with the function "?helptopic" I allways get an URL 
error. The problem is obvious from the error message because it gives the path 
where R tries to find the help files. R mixes two paths in the way that it uses 
the default path of the home directiory followed by my user defined path given 
via .libPaths. How can I give R the information about the right path without 
using the default path and mix both up when searching the help files. Can I 
simply add a line in the Rprofile.site file?

I do not know if this is a problem, but my personal working directory is 
diffrend from my personal library path?

Thanks in advance ...

Cheers,

Nils

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PLS package loading error!

2009-03-30 Thread mienad

Hi,

I am using  R 2.8.1 version on Windows with RGui. I have loaded pls package
lattest version (2.1-0). When I try to load this package in R using
library(pls) command, the following error message appear:

Erreur dans library(pls) : 
  'pls' n'est pas un package valide -- a-t-il été installé < 2.0.0 ?

Could you please help me to solve this problem?

Regards 

Damien
-- 
View this message in context: 
http://www.nabble.com/PLS-package-loading-error%21-tp22780027p22780027.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2009-03-30 Thread ankhee dutta
Hi, All
 I have a linux system of Mandriva-2007 with R version 2.3.0 and MySQL with
5.0.0. I have also got DBI-R database interface version-0.1-11 installed on
my Linux system.While installing RMySQL package version 0.5-11 but facing
the problem mentioned below .



* Installing *source* package 'RMySQL' ...
creating cache ./config.cache
checking how to run the C preprocessor... cc -E
checking for compress in -lz... yes
checking for getopt_long in -lc... yes
checking for mysql_init in -lmysqlclient... no
checking for mysql.h... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for mysql_init in -lmysqlclient... no
checking for /usr/local/include/mysql/mysql.h... no
checking for /usr/include/mysql/mysql.h... no
checking for /usr/local/mysql/include/
mysql/mysql.h... no
checking for /opt/include/mysql/mysql.h... no
checking for /include/mysql/mysql.h... no

Configuration error:
  could not find the MySQL installation include and/or library
  directories.  Manually specify the location of the MySQL
  libraries and the header files and re-run R CMD INSTALL.

INSTRUCTIONS:

1. Define and export the 2 shell variables PKG_CPPFLAGS and
   PKG_LIBS to include the directory for header files (*.h)
   and libraries, for example (using Bourne shell syntax):

  export PKG_CPPFLAGS="-I"
  export PKG_LIBS="-L -lmysqlclient"

   Re-run the R INSTALL command:

  R CMD INSTALL RMySQL_.tar.gz

2. Alternatively, you may pass the configure arguments
  --with-mysql-dir= (distribution directory)
   or
  --with-mysql-inc= (where MySQL header files reside)
  --with-mysql-lib= (where MySQL libraries reside)
   in the call to R INSTALL --configure-args='...'

   R CMD INSTALL --configure-args='--with-mysql-dir=DIR'
RMySQL_.tar.gz

ERROR: configuration failed for package 'RMySQL'
** Removing '/usr/lib/R/library/RMySQL'





Any help will be great.
Thankyou in advance.



-- 
Ankhee Dutta
project trainee,
JNU,New Delhi-67

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "[.data.frame" and lapply

2009-03-30 Thread Wacek Kusnierczyk
Bert Gunter wrote:
> Folks:
>
> I do not wish to agree or disagree with the criticisms of either the speed
> or possible design flaws of "[". But let's at least see what the docs say
> about the issues, using the simple example you provided:
>
>
> m = matrix(1:9, 3, 3)
> md = data.frame(m)
>
> md[1]
> # the first column
> ## as documented. This is because a data frame is a list of 3 identical
> ## length columns, and this is how [ works for lists
>
> m[1]
> # the first element (i.e., m[1,1])
> ## as documented. A matrix is just a vector with a dim attribute and 
> ## this is how [ works for vectors
>
> md[,i=3]
> # third row
> ## See below
>
> m[,i=3]
> # third column
> ##  Correct,as documented in ?"["  for matrices, to whit:
> "Note that these operations do not match their index arguments in the
> standard way: argument names are ignored and positional matching only is
> used. So m[j=2,i=1] is equivalent to m[2,1] and not to m[1,2]. "
>
> ## Note that the next lines immediately following say:
>
> "This may not be true for methods defined for them; for example it is not
> true for the data.frame methods described in [.data.frame. 
>
> To avoid confusion, do not name index arguments (but drop and exact must be
> named). "
>
> So, while it may be fair to characterize the md[,i=3] as a design flaw, it
> is both explicitly pointed out and warned against. Note that,of course
>
> md[,3]
> ## 3rd column, good practice
> md[,j=3]
> ## also 3rd column .. but warned against as bad practice
>
> Whether a behavior should be considered a "bug" if it is explicitly warned
> against in the docs, I leave for others to decide. Too deep for me. 
>   

ok, there may be a point here.  but comments such as the above quotes
from ?'[' provide evidence for that the design is chaotic, with lots of
non-obvious exceptions, explained somewhere there,
please-read-every-single-letter-in-tfm. 

furthermore, what is "This may not be true for methods defined for them"
supposed to tell a user trying to get an understanding of what will
happen if certain constructs are used?  and from what you quote, it
seems that the statement about ignored argument names (i.e., the index
names 'i' and 'j') is *not* applicable to [.data.frame.  it seems quite
clear to me.  and "To avoid confusion, do not name index arguments"
would better specify whose confusion is meant -- apparently, it is r's
implementation that is confused here.

best,
vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] which rows are duplicates?

2009-03-30 Thread Michael Dewey

At 05:07 30/03/2009, Aaron M. Swoboda wrote:

I would like to know which rows are duplicates of each other, not
simply that a row is duplicate of another row. In the following
example rows 1 and 3 are duplicates.

> x <- c(1,3,1)
> y <- c(2,4,2)
> z <- c(3,4,3)
> data <- data.frame(x,y,z)
x y z
1 1 2 3
2 3 4 4
3 1 2 3


Does this do what you want?
> x <- c(1,3,1)
> y <- c(2,4,2)
> z <- c(3,4,3)
> data <- data.frame(x,y,z)
> data.u <- unique(data)
> data.u
  x y z
1 1 2 3
2 3 4 4
> data.u <- cbind(data.u, set = 1:nrow(data.u))
> merge(data, data.u)
  x y z set
1 1 2 3   1
2 1 2 3   1
3 3 4 4   2

You need to do a bit more work to get them back into the original row 
order if that is essential.





I can't figure out how to get R to tell me that observation 1 and 3
are the same.  It seems like the "duplicated" and "unique" functions
should be able to help me out, but I am stumped.

For instance, if I use "duplicated" ...

> duplicated(data)
[1] FALSE FALSE TRUE

it tells me that row 3 is a duplicate, but not which row it matches.
How do I figure out WHICH row it matches?

And If I use "unique"...

> unique(data)
x y z
1 1 2 3
2 3 4 4

I see that rows 1 and 2 are unique, leaving me to infer that row 3 was
a duplicate, but again it doesn't tell me which row it was a duplicate
of (as far as I can tell). Am I missing something?

How can I determine that row 3 is a duplicate OF ROW 1?

Thanks,

Aaron




Michael Dewey
http://www.aghmed.fsnet.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column name assignment problem

2009-03-30 Thread Peter Dalgaard
Steve Murray wrote:
> Dear all,
> 
> Apologies for yet another question (!). Hopefully it won't be too tricky to 
> solve. I am attempting to add row and column names (these are in fact 
> numbers) to each of the tables created by the code (120 in total).
> 
> 
> # Create index of file names
> files <- print(ls()[1:120], quote=FALSE)  # This is the best way I could 
> manage to successfully attribute all the table names to a single list - I 
> realise it's horrible coding (especially as it relies on the first 120 
> objects stored in the memory actually being the objects I want to use)...
> 
> files
>   [1] "Fekete_198601" "Fekete_198602" "Fekete_198603" "Fekete_198604"
>   [5] "Fekete_198605" "Fekete_198606" "Fekete_198607" "Fekete_198608"
>   [9] "Fekete_198609" "Fekete_198610" "Fekete_198611" "Fekete_198612"
>   [13] "Fekete_198701" "Fekete_198702" "Fekete_198703" "Fekete_198704"
>   [17] "Fekete_198705" "Fekete_198706" "Fekete_198707" "Fekete_198708" 
> ...[truncated - there are 120 in total]
> 
> 
> # Provide column and row names according to lat/longs.
> 
> rnames <- sprintf("%.2f", seq(from = -89.75, to = 89.75, length = 360))
> columnnames <- sprintf("%.2f", seq(from = -179.75, to = 179.75, length = 720))
> 
> for (i in files) {
> assign(colnames((paste(Fekete_",index$year[i], index$month[i])", 
> sep='')), columnnames)
> assign(rownames(paste("rownames(Fekete_",index$year[i], 
> index$month[i],")", sep=''), rnames))
> }
> 
> 
> Error: unexpected string constant in:
> "for (i in files) {
> assign(colnames((paste(Fekete_",index$year[i], index$month[i])""
>> assign(rownames(paste("rownames(Fekete_",index$year[i], 
>> index$month[i],")", sep=''), rnames))
> Error in if (do.NULL) NULL else if (nr> 0) paste(prefix, seq_len(nr),  : 
>   argument is not interpretable as logical
> In addition: Warning message:
> In if (do.NULL) NULL else if (nr> 0) paste(prefix, seq_len(nr),  :
>   the condition has length> 1 and only the first element will be used
>> }
> Error: unexpected '}' in "}"


The generic issue here (read: I can't really be bothered to do your
problem in all details...) is that you cannot use assignment forms like

foo(x) <- bar

while accessing x via a character name. That is

a <- "plugh!"
assign(foo(a), bar)

and

foo(get(a)) <- bar

are both wrong.

You need to do it in steps, like

x <- get(a)
foo(x) <- bar
assign(a, x)

or, not really any prettier

eval(substitute(
   foo(x) <- bar, list(x=as.name(a)
))


> 
> 
> Is there a more elegant way of creating a list of file names in this case 
> (remember that there are 2 variable parts to each name) which would 
> facilitate the assigning of column and row names to each table? (And make 
> life easier when doing other things with the data, e.g. plotting...!).
> 
> Many thanks once again - the help offered really is appreciated.
> 
> Steve
> 
> 
> _
> All your Twitter and other social updates in one place 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrined dependent optimization.

2009-03-30 Thread Ben Bolker
rkevinbur...@charter.net wrote:
> I am sorry but I don't see the connection. with SANN and say 3
> variables one of the steps may increment x[1] by 0.1. Not only is
> this a non-discrete integer value but even if I could coerce SANN to
> only return discrete integer values for each step in the optimization
> once x[1] was set to say 2 I would have to search the other
> "variables" for a value of 2 and exchange x[1] and which ever
> variable was two so as to maintain the property that each variable
> has a unique discrete value constained from 1 : number of varables.
> 
> Thank you.
> 
> Kevin

  If you look more closely at the docs for method="SANN" (and
the examples), you'll see that SANN allows you to pass the
"gradient" argument (gr) as a custom function to provide the
candidate distribution.  Here's an example:

N <- 10
xvec <- seq(0,1,length=N)
target <- rank((xvec-0.2)^2)

objfun <- function(x) {
  sum((x-target)^2)/1e6
}

objfun(1:100)

swapfun <- function(x,N=10) {
  loc <- sample(N,size=2,replace=FALSE)
  tmp <- x[loc[1]]
  x[loc[1]] <- x[loc[2]]
  x[loc[2]] <- tmp
  x
}

set.seed(1001)
opt1 <- optim(fn=objfun,
  par=1:N,
  gr=swapfun,method="SANN",
  control=list(trace=10))

plot(opt1$par,target)



>  Ben Bolker  wrote:
>> 
>> 
>> rkevinburton wrote:
>>> I have an optimization question that I was hoping to get some
>>> suggestions on how best to go about sovling it. I would think
>>> there is probably a package that addresses this problem.
>>> 
>>> This is an ordering optimzation problem. Best to describe it with
>>> a simple example. Say I have 100 "bins" each with a ball in it
>>> numbered from 1 to 100. Each bin can only hold one ball. This
>>> optimization is that I have a function 'f' that this array of
>>> bins and returns a number. The number returned from
>>> f(1,2,3,4) would return a different number from that of 
>>> f(2,1,3,4). The optimization is finding the optimum order of
>>> these balls so as to produce a minimum value from 'f'.I cannot
>>> use the regular 'optim' algorithms because a) the values are
>>> discrete, and b) the values are dependent ie. when the "variable"
>>> representing the bin location is changed (in this example a new
>>> ball is put there) the existing ball will need to be moved to
>>> another bin (probably swapping positions), and c) each "variable"
>>> is constrained, in the example above the only allowable values 
>>> are integers from 1-100. So the problem becomes finding the
>>> optimum order of the "balls".
>>> 
>>> Any suggestions?
>>> 
>>> 
>> See method "SANN" under ?optim.
>> 
>> Ben Bolker
>> 
>> -- View this message in context:
>> http://www.nabble.com/Constrined-dependent-optimization.-tp22772520p22772795.html
>>  Sent from the R help mailing list archive at Nabble.com.
>> 
>> __ R-help@r-project.org
>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide http://www.R-project.org/posting-guide.html 
>> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bol...@ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sliding window over irregular intervals

2009-03-30 Thread Irene Gallego Romero

Dear all,

I have some very big data files that look something like this:

id chr pos ihh1 ihh2 xpehh
rs5748748 22 15795572 0.0230222 0.0268394 -0.153413
rs5748755 22 15806401 0.0186084 0.0268672 -0.367296
rs2385785 22 15807037 0.0198204 0.0186616 0.0602451
rs1981707 22 15809384 0.0299685 0.0176768 0.527892
rs1981708 22 15809434 0.0305465 0.0187227 0.489512
rs11914222 22 15810040 0.0307183 0.0172399 0.577633
rs4819923 22 15813210 0.02707 0.0159736 0.527491
rs5994105 22 15813888 0.025202 0.0141296 0.578651
rs5748760 22 15814084 0.0242894 0.0146486 0.505691
rs2385786 22 15816846 0.0173057 0.0107816 0.473199
rs1990483 22 15817310 0.0176641 0.0130525 0.302555
rs5994110 22 15821524 0.0178411 0.0129001 0.324267
rs17733785 22 15822154 0.0201797 0.0182093 0.102746
rs7287116 22 15823131 0.0201993 0.0179028 0.12069
rs5748765 22 15825502 0.0193195 0.0176513 0.090302

I'm trying to extract the maximum and minimum xpehh (last column) values 
within a sliding window (non overlapping), of width 1 (calculated 
relative to pos (third column)). However, as you can tell from the brief 
excerpt here, although all possible intervals will probably be covered 
by at least one data point, the number of data points will be variable 
(incidentally, if anyone knows of a way to obtain this number, that 
would be lovely), as will the spacing between them. Furthermore, values 
of chr (second column) will range from 1 to 22, and values of pos will 
be overlapping across them; I want to evaluate the window separately for 
each value of chr.


I've looked at the help and FAQ on sliding windows, but I'm a relative 
newcomer to R and cannot find a way to do what I need to do. Everything 
I've managed to unearth so far seems geared towards smoother time 
series. Any help on this problem would be vastly appreciated.


Thanks,
Irene

--
Irene Gallego Romero
Leverhulme Centre for Human Evolutionary Studies
University of Cambridge
Fitzwilliam St
Cambridge
CB2 1QH
UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how does stop() interfere with on.exit()?

2009-03-30 Thread Wacek Kusnierczyk
consider the following example:

(f = function() on.exit(f()))()
# error: evaluation nested too deeply

(f = function() { on.exit(f()); stop() })()
# error in f():
# error in f():
# ... some 100 lines skipped ...
# error: C stack usage is too close to the limit

why does not the second behave as the first, i.e., report, in one line,
too deep recursion?  the second seems to break the interface by
reporting a condition internal to the implementation, which should not
be visible to the user.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrined dependent optimization.

2009-03-30 Thread Hans W. Borchers

Image you want to minimize the following linear function

f <- function(x) sum( c(1:50, 50:1) * x / (50*51) )

on the set of all permutations of the numbers 1,..., 100.

I wonder how will you do that with lpSolve? I would simply order
the coefficients and then sort the numbers 1,...,100 accordingly.

I am also wondering how optim with "SANN" could be applied here.

As this is a problem in the area of discrete optimization resp.
constraint programming, I propose to use an appropriate program
here such as the free software Bprolog. I would be interested to
learn what others propose.

Of course, if we don't know anything about the function f then
it amounts to an exhaustive search on the 100! permutations --
probably not a feasible job.

Regards,  Hans Werner



Paul Smith wrote:
> 
> On Sun, Mar 29, 2009 at 9:45 PM,   wrote:
>> I have an optimization question that I was hoping to get some suggestions
>> on how best to go about sovling it. I would think there is probably a
>> package that addresses this problem.
>>
>> This is an ordering optimzation problem. Best to describe it with a
>> simple example. Say I have 100 "bins" each with a ball in it numbered
>> from 1 to 100. Each bin can only hold one ball. This optimization is that
>> I have a function 'f' that this array of bins and returns a number. The
>> number returned from f(1,2,3,4) would return a different number from
>> that of f(2,1,3,4). The optimization is finding the optimum order of
>> these balls so as to produce a minimum value from 'f'.I cannot use the
>> regular 'optim' algorithms because a) the values are discrete, and b) the
>> values are dependent ie. when the "variable" representing the bin
>> location is changed (in this example a new ball is put there) the
>> existing ball will need to be moved to another bin (probably swapping
>> positions), and c) each "variable" is constrained, in the example above
>> the only allowable values are integers from 1-100. So the problem becomes
>> finding the optimum order of the "balls".
>>
>> Any suggestions?
> 
> If your function f is linear, then you can use lpSolve.
> 
> Paul
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Constrined-dependent-optimization.-tp22772520p22782922.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls, convergence and starting values

2009-03-30 Thread Christian Ritz
Hi Patrick,

there exist specialized functionality in R that offer both automated 
calculation of
starting values and relatively robust optimization, which can be used with 
success in many
common cases of nonlinear regression, also for your data:

library(drc)  # on CRAN

## Fitting 3-parameter logistic model
## (slightly different parameterization from SSlogis())
bdd.m1 <- drm(pourcma~transat, weights=sqrt(nbfeces), data=bdd, fct=L.3())

plot(bdd.m1, broken=TRUE, conLevel=0.0001)

summary(bdd.m1)


Of course, standard errors are huge as the data do not really support this 
model (as
already pointed out by other replies to this post).


Christian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Burt table from word frequency list

2009-03-30 Thread Joan-Josep Vallbé
Thank you very much for all your comments, and sorry for the confusion  
of my messages. My corpus is a collection of responses to an open  
question from a questionnaire. Since my intention is not to create  
groups of respondents but to treat all responses as a "whole  
discourse" on a particular issue so that I can find out different  
"semantic contexts" within the text. I have all the responses in a  
single document, then I want to split it into strings of (specified) n  
words. The resulting semantic contexts would be sets of (correlated)  
word-strings containing particularly relevant (correlated) words.


I guess I must dive deeper into the "ca" and "tm" packages. Any other  
ideas will be really welcomed.


best,

Pep Vallbé





On Mar 30, 2009, at 2:05 PM, Alan Zaslavsky wrote:

Maybe not terribly hard, depending on exactly what you need.   
Suppose you turn your text into a character vector 'mytext' of  
words.  Then for a table of words appearing delta words apart  
(ordered), you can table mytext against itself with a lag:


nwords=length(mytext)
burttab=table(mytext[-(1:delta)],mytext[nwords+1-(1:delta)])

Add to its transpose and sum over delta up to your maximum distance  
apart. If you want only words appearing near each other within the  
same sentence (or some other unit), pad out the sentence break with  
at least delta instances of a dummy spacer:


   the cat chased the greedy rat SPACER SPACER SPACER the dog chased  
the

   clever cat

This will count all pairings at distance delta; if you want to count  
only those for which this was the NEAREST co-occurence (so


   the cat and the rate chased the dog

would count as two at delta=3 but not one at delta=6) it will be  
trickier and I'm not sure this approach can be modified to handle it.



Date: Sun, 29 Mar 2009 22:20:15 -0400
From: "Murray Cooper" 
Subject: Re: [R] Burt table from word frequency list
The usual approach is to count the co-occurence within so many  
words of

each other.  Typical is between 5 words before and 5 words after a
given word.  So for each word in the document, you look for the
occurence of all other words within -5 -4 -3 -2 -1 0 1 2 3 4 5 words.
Depending on the language and the question being asked certain words
may be excluded.
This is not a simple function! I don't know if anyone has done a
package, for this type of analysis but with over 2000 packages  
floating

around you might get lucky.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PLS package loading error!

2009-03-30 Thread James W. MacDonald

Hi Damien,

How did you install the package? Usually this error pops up when people 
simply download the zip file and then unzip into their library directory.


If you use the package installation functions in R, you shouldn't have 
this problem:


install.packages("pls")

Best,

Jim



mienad wrote:

Hi,

I am using  R 2.8.1 version on Windows with RGui. I have loaded pls package
lattest version (2.1-0). When I try to load this package in R using
library(pls) command, the following error message appear:

Erreur dans library(pls) : 
  'pls' n'est pas un package valide -- a-t-il été installé < 2.0.0 ?


Could you please help me to solve this problem?

Regards 


Damien


--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrined dependent optimization.

2009-03-30 Thread Paul Smith
Actually, one can use lpSolve to find a solution to your example. To
be more precise, it would be necessary to solve a sequence of linear
*integer* programs. The first one would be:

max f(x)

subject to

x >= 0
x <= 100
sum(x) = 100.

>From this, one would learn the optimal position of the number 100
(coefficient 50). Afterwards, one would remove the coefficient 50 from
the objective function, and the constraints would be:

x >= 0
x <= 99
sum(x) = 99.

The optimal position for the number 99 would be returned by lpSolve. And so on.

Paul


On Mon, Mar 30, 2009 at 2:22 PM, Hans W. Borchers
 wrote:
>
> Image you want to minimize the following linear function
>
>    f <- function(x) sum( c(1:50, 50:1) * x / (50*51) )
>
> on the set of all permutations of the numbers 1,..., 100.
>
> I wonder how will you do that with lpSolve? I would simply order
> the coefficients and then sort the numbers 1,...,100 accordingly.
>
> I am also wondering how optim with "SANN" could be applied here.
>
> As this is a problem in the area of discrete optimization resp.
> constraint programming, I propose to use an appropriate program
> here such as the free software Bprolog. I would be interested to
> learn what others propose.
>
> Of course, if we don't know anything about the function f then
> it amounts to an exhaustive search on the 100! permutations --
> probably not a feasible job.
>
> Regards,  Hans Werner
>
>
>
> Paul Smith wrote:
>>
>> On Sun, Mar 29, 2009 at 9:45 PM,   wrote:
>>> I have an optimization question that I was hoping to get some suggestions
>>> on how best to go about sovling it. I would think there is probably a
>>> package that addresses this problem.
>>>
>>> This is an ordering optimzation problem. Best to describe it with a
>>> simple example. Say I have 100 "bins" each with a ball in it numbered
>>> from 1 to 100. Each bin can only hold one ball. This optimization is that
>>> I have a function 'f' that this array of bins and returns a number. The
>>> number returned from f(1,2,3,4) would return a different number from
>>> that of f(2,1,3,4). The optimization is finding the optimum order of
>>> these balls so as to produce a minimum value from 'f'.I cannot use the
>>> regular 'optim' algorithms because a) the values are discrete, and b) the
>>> values are dependent ie. when the "variable" representing the bin
>>> location is changed (in this example a new ball is put there) the
>>> existing ball will need to be moved to another bin (probably swapping
>>> positions), and c) each "variable" is constrained, in the example above
>>> the only allowable values are integers from 1-100. So the problem becomes
>>> finding the optimum order of the "balls".
>>>
>>> Any suggestions?
>>
>> If your function f is linear, then you can use lpSolve.
>>
>> Paul
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Constrined-dependent-optimization.-tp22772520p22782922.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [OT] Contacting "Introductory Statistics for Engineering Experimentation" authors

2009-03-30 Thread Douglas Bates
I have been examining the text "Introductory Statistics for
Engineering Experimentation" by Peter R. Nelson, Marie Coffin and
Karen A.F. Copeland (Elsevier, 2003).  There are several interesting
data sets used in the book and I plan to create an R package for them.
 I would like to contact the surviving authors (apparently Peter R.
Nelson died in 2004) but have not been able to obtain contact
information for them.  According to the preface the book was developed
for an intro engineering stats course at Clemson however no one at
Clemson could provide any leads.  Does anyone on this list have
contact information for Marie Coffin or Karen A.F. Copeland?  I have
been unsuccessful in various google searches.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Add missing values/timestamps

2009-03-30 Thread j.k

Hello alltogheter,
I have the following problem and maybe someone can help me with it.
I have a list of values with times. They look like that:

   V1   V2
1 2008-10-14 08:45:00 94411.08
2 2008-10-14 08:50:00 90745.45
3 2008-10-14 08:55:00 82963.35
4 2008-10-14 09:00:00 75684.38
5 2008-10-14 09:05:00 78931.82
6 2008-10-14 09:20:00 74580.11
7 2008-10-14 09:25:00 69666.48
8 2008-10-14 09:30:00 77794.89

I have these data combined from different series of measurements.
As you can see the problem is that between these series are gaps which I
want to fill.

The format of the time is POSIXct

Are there any suggestions how I can fill these missing times and afterwards
interpolate/predict their values?

Thanks in advance
Johannes
-- 
View this message in context: 
http://www.nabble.com/Add-missing-values-timestamps-tp22784737p22784737.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sliding window over irregular intervals

2009-03-30 Thread David Winsemius
The window you describe is not one I would call sliding and the  
intervals are regular with an irregular number of events within the  
windows. One way would be to use the results of trunc(pos/1) as a  
factor with tapply:


(Related functions are floor() and round(), but your pos values appear  
to be positive, so there should not be problems with how they work  
across 0)


After creating a dataframe, dta, try something like:

> tapply(dta$xpehh, as.factor(trunc(dta$pos/1)), min)
 1579  1580  1581  1582
-0.153413 -0.367296  0.302555  0.090302

--
David Winsemius
On Mar 30, 2009, at 9:01 AM, Irene Gallego Romero wrote:


Dear all,

I have some very big data files that look something like this:

id chr pos ihh1 ihh2 xpehh
rs5748748 22 15795572 0.0230222 0.0268394 -0.153413
rs5748755 22 15806401 0.0186084 0.0268672 -0.367296
rs2385785 22 15807037 0.0198204 0.0186616 0.0602451
rs1981707 22 15809384 0.0299685 0.0176768 0.527892
rs1981708 22 15809434 0.0305465 0.0187227 0.489512
rs11914222 22 15810040 0.0307183 0.0172399 0.577633
rs4819923 22 15813210 0.02707 0.0159736 0.527491
rs5994105 22 15813888 0.025202 0.0141296 0.578651
rs5748760 22 15814084 0.0242894 0.0146486 0.505691
rs2385786 22 15816846 0.0173057 0.0107816 0.473199
rs1990483 22 15817310 0.0176641 0.0130525 0.302555
rs5994110 22 15821524 0.0178411 0.0129001 0.324267
rs17733785 22 15822154 0.0201797 0.0182093 0.102746
rs7287116 22 15823131 0.0201993 0.0179028 0.12069
rs5748765 22 15825502 0.0193195 0.0176513 0.090302

I'm trying to extract the maximum and minimum xpehh (last column)  
values within a sliding window (non overlapping), of width 1  
(calculated relative to pos (third column)). However, as you can  
tell from the brief excerpt here, although all possible intervals  
will probably be covered by at least one data point, the number of  
data points will be variable (incidentally, if anyone knows of a way  
to obtain this number, that would be lovely), as will the spacing  
between them. Furthermore, values of chr (second column) will range  
from 1 to 22, and values of pos will be overlapping across them; I  
want to evaluate the window separately for each value of chr.


I've looked at the help and FAQ on sliding windows, but I'm a  
relative newcomer to R and cannot find a way to do what I need to  
do. Everything I've managed to unearth so far seems geared towards  
smoother time series. Any help on this problem would be vastly  
appreciated.


Thanks,
Irene

--
Irene Gallego Romero
Leverhulme Centre for Human Evolutionary Studies
University of Cambridge
Fitzwilliam St
Cambridge
CB2 1QH
UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix max by row

2009-03-30 Thread Bert Gunter
 
Serves me right, I suppose. Timing seems also very dependent on the
dimensions of the matrix. Here's what I got with my inadequate test:

> x <- matrix(rnorm(3e5),ncol=3)
## via apply
> system.time(apply(x,1,max))
   user  system elapsed 
   2.090.022.10

## via pmax 
> system.time(do.call(pmax,data.frame(x)))
   user  system elapsed 
   0.100.020.11 
>

Draw your own conclusions!

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
650-467-7374

-Original Message-
From: Wacek Kusnierczyk [mailto:waclaw.marcin.kusnierc...@idi.ntnu.no] 
Sent: Monday, March 30, 2009 2:33 AM
To: Rolf Turner
Cc: Bert Gunter; 'Ana M Aparicio Carrasco'; r-help@r-project.org
Subject: Re: [R] Matrix max by row

Rolf Turner wrote:
> I tried the following:
>
> m <- matrix(runif(10),1000,100)
> junk <- gc()
> print(system.time(for(i in 1:100) X1 <- do.call(pmax,data.frame(m
> junk <- gc()
> print(system.time(for(i in 1:100) X2 <- apply(m,1,max)))
>
> and got
>
>user  system elapsed
>   2.704   0.110   2.819
>user  system elapsed
>   1.938   0.098   2.040
>
> so unless there's something that I am misunderstanding (always a serious
> consideration) Wacek's apply method looks to be about 1.4 times
> *faster* than
> the do.call/pmax method.


hmm, since i was called by name (i'm grateful, rolf), i feel obliged to
check the matters myself:

# dummy data, presumably a 'large matrix'?
n = 5e3
m = matrix(rnorm(n^2), n, n)

# what is to be benchmarked...
waku = expression(matrix(apply(m, 1, max), nrow(m)))
bert = expression(do.call(pmax,data.frame(m)))

# to be benchmarked
library(rbenchmark)
benchmark(replications=10, order='elapsed', columns=c('test',
'elapsed'),
   waku=matrix(apply(m, 1, max), nrow(m)),
   bert=do.call(pmax,data.frame(m)))

takes quite a while, but here you go:

#   test elapsed
# 1 waku  11.838
# 2 bert  20.833

where bert's solution seems to require a wonder to 'be considerably
faster for large matrices'.  to have it fair, i also did

# to be benchmarked
library(rbenchmark)
benchmark(replications=10, order='elapsed', columns=c('test',
'elapsed'),
   bert=do.call(pmax,data.frame(m)),
   waku=matrix(apply(m, 1, max), nrow(m)))

#  test elapsed
# 2 waku  11.695
# 1 bert  20.912
   
take home point: a good product sells itself, a bad product may not sell
despite aggressive marketing.

rolf, thanks for pointing this out.

cheers,
vQ


> cheers,
>
> Rolf Turner
>
>
> On 30/03/2009, at 3:55 PM, Bert Gunter wrote:
>
>> If speed is a consideration,availing yourself of the built-in pmax()
>> function via
>>
>> do.call(pmax,data.frame(yourMatrix))
>>
>> will be considerably faster for large matrices.
>>
>> If you are puzzled by why this works, it is a useful exercise in R to
>> figure
>> it out.
>>
>> Hint:The man page for ?data.frame says:
>> "A data frame is a list of variables of the same length with unique row
>> names, given class 'data.frame'."
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>> Genentech Nonclinical Statistics
>>
>> -Original Message-
>> From: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org] On
>> Behalf Of Wacek Kusnierczyk
>> Sent: Saturday, March 28, 2009 5:22 PM
>> To: Ana M Aparicio Carrasco
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Matrix max by row
>>
>> Ana M Aparicio Carrasco wrote:
>>> I need help about how to obtain the max by row in a matrix.
>>> For example if I have the following matrix:
>>> 2 5 3
>>> 8 7 2
>>> 1 8 4
>>>
>>> The max by row will be:
>>> 5
>>> 8
>>> 8
>>>
>>
>> matrix(apply(m, 1, max), nrow(m))
>>
>> vQ
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ##
> Attention:This e-mail message is privileged and confidential. If you
> are not theintended recipient please delete the message and notify the
> sender.Any views or opinions presented are solely those of the author.
>
> This e-mail has been scanned and cleared by
> MailMarshalwww.marshalsoftware.com
> ##

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] List assignment in a while loop and timing

2009-03-30 Thread Saptarshi Guha
Hello R users
I have question about the time involved in list assignment.
Consider the following code snippet(see below). The first line creates
a reader object,
which is the interface to 1MM key-value pairs (serialized R objects) spanning 50
files (a total of 50MB). rhsqstart initiates the reading and I loop, reading
each key-value pair using rhsqnextKVR. If this returns NULL, we switch to the
next file and if this returns null we break.

If I comment out line A1, it takes 39 seconds on a quad core intel with
16GB ram running R-2.8
If I include the assignment A1 it takes ~85 seconds.

I have preassigned the list in line A0, so I'm guessing there is no resizing
going on, so why does the time increase so much?

Thank you for your time.
Regards
Saptarshi


==code==
rdr <- rhsqreader("~/tmp/pp",local=T,pattern="^p")
rdr <- rhsqstart(rdr)
i <- 1;
h=as.list(rep(1,1e6)) ##A0
while(TRUE){
  value <-rhsqnextKVR(rdr) ##Returns a list of two elements K,V
  if(is.null(value)) {
message(rdr$df[rdr$current])
rdr <- rhsqnextpath(rdr)
if(is.null(rdr)) break;
  }
  h[[i]] <- value; ##A1
  i <- i+1
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2009-03-30 Thread milton ruser
How about you include a thread like
"Problem with R 2.3.0 and MySQL on Mandriva-2007".

Bests,

milton

On Mon, Mar 30, 2009 at 7:07 AM, ankhee dutta  wrote:

> Hi, All
>  I have a linux system of Mandriva-2007 with R version 2.3.0 and MySQL with
> 5.0.0. I have also got DBI-R database interface version-0.1-11 installed on
> my Linux system.While installing RMySQL package version 0.5-11 but facing
> the problem mentioned below .
>
>
>
> * Installing *source* package 'RMySQL' ...
> creating cache ./config.cache
> checking how to run the C preprocessor... cc -E
> checking for compress in -lz... yes
> checking for getopt_long in -lc... yes
> checking for mysql_init in -lmysqlclient... no
> checking for mysql.h... no
> checking for mysql_init in -lmysqlclient... no
> checking for mysql_init in -lmysqlclient... no
> checking for mysql_init in -lmysqlclient... no
> checking for mysql_init in -lmysqlclient... no
> checking for mysql_init in -lmysqlclient... no
> checking for /usr/local/include/mysql/mysql.h... no
> checking for /usr/include/mysql/mysql.h... no
> checking for /usr/local/mysql/include/
> mysql/mysql.h... no
> checking for /opt/include/mysql/mysql.h... no
> checking for /include/mysql/mysql.h... no
>
> Configuration error:
>  could not find the MySQL installation include and/or library
>  directories.  Manually specify the location of the MySQL
>  libraries and the header files and re-run R CMD INSTALL.
>
> INSTRUCTIONS:
>
> 1. Define and export the 2 shell variables PKG_CPPFLAGS and
>   PKG_LIBS to include the directory for header files (*.h)
>   and libraries, for example (using Bourne shell syntax):
>
>  export PKG_CPPFLAGS="-I"
>  export PKG_LIBS="-L -lmysqlclient"
>
>   Re-run the R INSTALL command:
>
>  R CMD INSTALL RMySQL_.tar.gz
>
> 2. Alternatively, you may pass the configure arguments
>  --with-mysql-dir= (distribution directory)
>   or
>  --with-mysql-inc= (where MySQL header files reside)
>  --with-mysql-lib= (where MySQL libraries reside)
>   in the call to R INSTALL --configure-args='...'
>
>   R CMD INSTALL --configure-args='--with-mysql-dir=DIR'
> RMySQL_.tar.gz
>
> ERROR: configuration failed for package 'RMySQL'
> ** Removing '/usr/lib/R/library/RMySQL'
>
>
>
>
>
> Any help will be great.
> Thankyou in advance.
>
> --
> Ankhee Dutta
> project trainee,
> JNU,New Delhi-67
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Excellent Talk on Statistics (Good examples of stat. visualization)

2009-03-30 Thread Ken-JP


with very good examples of statistical visualization.

"Talks Hans Rosling: Debunking third-world myths with the best stats you've
ever seen"

http://www.ted.com/index.php/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

-- 
View this message in context: 
http://www.nabble.com/Excellent-Talk-on-Statistics-%28Good-examples-of-stat.-visualization%29-tp22785778p22785778.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] Contacting "Introductory Statistics for EngineeringExperimentation" authors

2009-03-30 Thread Gaj Vidmar
Two authors appear to be the same as of the book "Analysis of Means" (ANOM), 
which I read and has a website at http://www.analysisofmeans.com/

If I remember correctly, Mr. Nelson is deceased, but you might nevertheless 
reach Mrs. Copeland following the Contact Us link at the ANOM website, which 
leads to i...@analysisofmeans.com.

Or, you might be able to contact her through Boulder Statistics (another 
link at the ANOM website) at http://www.boulderstats.com/ 
(geti...@boulderstats.com).

Regards,
Assist.Prof. Gaj Vidmar, PhD
Institute for Rehabilitation, Republic of Slovenia

"Douglas Bates"  wrote in message 
news:40e66e0b0903300725k55ac5294m50f4f953047b0...@mail.gmail.com...
>I have been examining the text "Introductory Statistics for
> Engineering Experimentation" by Peter R. Nelson, Marie Coffin and
> Karen A.F. Copeland (Elsevier, 2003).  There are several interesting
> data sets used in the book and I plan to create an R package for them.
> I would like to contact the surviving authors (apparently Peter R.
> Nelson died in 2004) but have not been able to obtain contact
> information for them.  According to the preface the book was developed
> for an intro engineering stats course at Clemson however no one at
> Clemson could provide any leads.  Does anyone on this list have
> contact information for Marie Coffin or Karen A.F. Copeland?  I have
> been unsuccessful in various google searches.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to input multiple .txt files

2009-03-30 Thread Mike Lawrence
To repent for my sins, I'll also suggest that Hadley Wickham's "plyr"
package (http://had.co.nz/plyr/) is also useful/parsimonious in this
context:

a <- ldply(cust1_files,read.table)


On Mon, Mar 30, 2009 at 9:32 AM, baptiste auguie  wrote:
> may i suggest the following,
>
>
> a <- do.call(rbind, lapply(cust1_files, read.table))
>
> (i believe expanding objects in a for loop belong to the R Inferno)
>
> baptiste
>
> On 30 Mar 2009, at 12:58, Mike Lawrence wrote:
>
>>
>> cust1_files =
>> list.files(path=path_to_my_files,pattern='cust1',full.names=TRUE)
>> a=NULL
>> for(this_file in cust1_files){
>>      a=rbind(a,read.table(this_file))
>> }
>> write.table(a,'cust1.master.txt')
>
> _
>
> Baptiste Auguié
>
> School of Physics
> University of Exeter
> Stocker Road,
> Exeter, Devon,
> EX4 4QL, UK
>
> Phone: +44 1392 264187
>
> http://newton.ex.ac.uk/research/emag
> __
>
>



-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tinyurl.com/mikes-public-calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column name assignment problem

2009-03-30 Thread Steve Murray

Dear Peter, Jim and all,

Thanks for the information regarding how to structure 'assign' commands. I've 
had a go at doing this, based on your advice, and although I feel I'm a lot 
closer now, I can't quite get it to work:

rnames <- sprintf("%.2f", seq(from = -89.75, to = 89.75, length = 360))
columnnames <- sprintf("%.2f", seq(from = -179.75, to = 179.75, length = 720))

for (i in 1:120) {
Fekete_table <- get(paste("Fekete_", index$year[i], index$month[i], 
sep=''))
colnames(Fekete_table) <- columnnames
rownames(Fekete_table) <- rnames
assign(paste("Fekete_",index$year[i], index$month[i], sep=''),
colnames(Fekete_table))
}

This assigns the column headings to each table, so that each table doesn't 
contain data any longer, but simply the column values. I tried inserting 
assign(colnames(paste("Fekete_"...) but this resulted in the type of error that 
was mentioned in the previous message. I've run dry of ideas as to how I should 
restructure the commands, so would be grateful for any pointers.

Many thanks,

Steve


_
[[elided Hotmail spam]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mature SOAP Interface for R

2009-03-30 Thread Michael Lawrence
On Sat, Mar 28, 2009 at 6:08 PM, zubin  wrote:

> Hello, we are writing rich internet user interfaces and like to call R for
> some of the computational needs on the data, as well as some creation of
> image files.  Our objects communicate via the SOAP interface.  We have been
> researching the various packages to expose R as a SOAP service.
>
> No current CRAN SOAP packages however.
>
> Found 3 to date:
>
> RSOAP (http://sourceforge.net/projects/rsoap/)
> SSOAP http://www.omegahat.org/SSOAP/
>
> looks like a commercial version?
> http://random-technologies-llc.com/products/rsoap
>
> Does anyone have experience with these 3 and can recommend the most
> 'mature' R - SOAP interface package?
>

Well, SSOAP is (the last time I checked) just a SOAP client. rsoap (if we're
talking about the same package) is actually a python SOAP server that
communicates to R via rpy.

You might want to check out the RWebServices package in Bioconductor. I
think it uses Java for its SOAP handling.

Michael


>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unicode only works with a second one

2009-03-30 Thread Greg Snow
I don't know how to help with the Unicode issue, but one alternative is the 
my.symbols function in the TeachingDemos package (see ?ms.male as well as 
?my.symbols).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Thomas Steiner
> Sent: Saturday, March 28, 2009 9:19 AM
> To: r-h...@stat.math.ethz.ch
> Subject: [R] unicode only works with a second one
> 
> I'd like to paste a zodiac sign on a graph, but it only prints it when
> I add another unicode ( \u3030) to the desired \u2648 - why?
> See the examplecode (compare the orange with the skyblue):
> 
> plot(c(-1,1),c(-4,-2),type="n")
> text(x=0,y=-3.0,labels="\u2648 \u3030",cex=2.3,col="skyblue")
> text(x=0,y=-3.2,labels="\u2648",cex=2.3,col="orange")
> zodiac=c("\u2642 \u2643 \u2644 \u2645 \u2646 \u2647 \u2648 \u2649
> \u2650 \u2651 \u2652 \u2653")
> text(x=0,y=-3.5,labels=paste(zodiac,"\u3030"),cex=2.3,col="navy")
> 
> I use R version 2.8.1 (2008-12-22) under MS Windows Vista.
> Thanks for help
> Thomas
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrined dependent optimization.

2009-03-30 Thread Paul Smith
Optim with SANN also solves your example:

---

f <- function(x) sum(c(1:50,50:1)*x)

swapfun <- function(x,N=100) {
 loc <- sample(N,size=2,replace=FALSE)
 tmp <- x[loc[1]]
 x[loc[1]] <- x[loc[2]]
 x[loc[2]] <- tmp
 x
}

N <- 100

opt1 <- 
optim(fn=f,par=sample(1:N,N),gr=swapfun,method="SANN",control=list(maxit=5,fnscale=-1,trace=10))
opt1$par
opt1$value

---

We need to specify a large number of iterations to get the optimal
solution. The objective function at the optimum is 170425, and one
gets a close value with optim and SANN.

Paul


On Mon, Mar 30, 2009 at 2:22 PM, Hans W. Borchers
 wrote:
>
> Image you want to minimize the following linear function
>
>    f <- function(x) sum( c(1:50, 50:1) * x / (50*51) )
>
> on the set of all permutations of the numbers 1,..., 100.
>
> I wonder how will you do that with lpSolve? I would simply order
> the coefficients and then sort the numbers 1,...,100 accordingly.
>
> I am also wondering how optim with "SANN" could be applied here.
>
> As this is a problem in the area of discrete optimization resp.
> constraint programming, I propose to use an appropriate program
> here such as the free software Bprolog. I would be interested to
> learn what others propose.
>
> Of course, if we don't know anything about the function f then
> it amounts to an exhaustive search on the 100! permutations --
> probably not a feasible job.
>
> Regards,  Hans Werner
>
>
>
> Paul Smith wrote:
>>
>> On Sun, Mar 29, 2009 at 9:45 PM,   wrote:
>>> I have an optimization question that I was hoping to get some suggestions
>>> on how best to go about sovling it. I would think there is probably a
>>> package that addresses this problem.
>>>
>>> This is an ordering optimzation problem. Best to describe it with a
>>> simple example. Say I have 100 "bins" each with a ball in it numbered
>>> from 1 to 100. Each bin can only hold one ball. This optimization is that
>>> I have a function 'f' that this array of bins and returns a number. The
>>> number returned from f(1,2,3,4) would return a different number from
>>> that of f(2,1,3,4). The optimization is finding the optimum order of
>>> these balls so as to produce a minimum value from 'f'.I cannot use the
>>> regular 'optim' algorithms because a) the values are discrete, and b) the
>>> values are dependent ie. when the "variable" representing the bin
>>> location is changed (in this example a new ball is put there) the
>>> existing ball will need to be moved to another bin (probably swapping
>>> positions), and c) each "variable" is constrained, in the example above
>>> the only allowable values are integers from 1-100. So the problem becomes
>>> finding the optimum order of the "balls".
>>>
>>> Any suggestions?
>>
>> If your function f is linear, then you can use lpSolve.
>>
>> Paul
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Constrined-dependent-optimization.-tp22772520p22782922.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nonparametric analysis of repeated measurements data with sm library

2009-03-30 Thread Alphonse Monkamg
Dear all,
Does anybody know how to get more evaluation points in performing Nonparametric 
analysis of repeated measurements data with "sm" library. The following command 
gives the estimation for 50 points, by I would like to increase to 100 points
But I do not know how to do that.
library(sm)
provide.data(citrate, options=list(describe=FALSE))
provide.data(dogs, options=list(describe=FALSE))
a <- sm.rm(y=citrate, display.rice=TRUE)
a$eval.points
 
Many thanks in advance.
Alphonse


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NY City Conf for Enthusiastic Users of R, June 18-19, 2009

2009-03-30 Thread HRISHIKESH D. VINOD

Conference on Quantitative Social Science
Research Using R
June 18-19 (Thursday-Friday), 2009, Fordham University, 113 West 60th
Street, New York. (next door to Lincoln Center for Performing Arts).

conf. website: http://www.cis.fordham.edu/QR2009


Hrishikesh (Rick) D. Vinod
Professor of Economics, Fordham University
author of new econometrics book using R:
http://www.worldscibooks.com/economics/6895.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HELP WITH SEM LIBRARY AND WITH THE MODEL'S SPECIFICATION

2009-03-30 Thread Analisi Dati
Dear users,
i'm using the sem package in R, because i need to improve a confermative factor 
analisys.
I have so many questions in my survey, and i suppose, for example,  that  
Question 1 (Q1) Q2 and Q3 explain the same thing (factor F1), Q4,Q5 and Q6 
explain F2 and Q7 and Q8 explain F3...
For check that what i supposed is true, i run this code to see if the values of 
loadings are big or not.
(In this code i used more than 3 factors)
 
library("sem")
#put in "mydata", the value of the questions
mydata <- 
data.frame(X$X12a,X$X12b,X$X12c,X$X12d,X$X12e,X$X12f,X$X12g,X$X12h,X$X12i,X$X12l,X$X12m,X$X12n,X$X12o,X$X12p,X$X12q,X$X12r,X$X12s,X$X1a,X$X1b,X$X1c,X$X1d,X$X1e,X$X1f,X$X3h,X$X3i,X$X3l,X$X3m,X$X3n,X$X3o,X$X3p,X$X3q,X$X3r,X$X3s,X$X3t,X$X3u,X$X3v,X$X4a,X$X5q,X$X5r,X$X5s,X$X8a,X$X8b,X$X8c,X$X8d)
#i calculate the covariance of the data
mydata.cov <- cov(mydata,use="complete.obs")
#I specify my model
model.mydata <- specify.model() 
F1 ->  X.X12a, lam1, NA
F1 ->  X.X12b, lam2, NA 
F1 ->  X.X12c, lam3, NA
F1 ->  X.X12d, lam4, NA
F1 ->  X.X12e, lam5, NA 
F1 ->  X.X12f, lam6, NA
F1 ->  X.X12g, lam7, NA
F2 ->  X.X12h, lam8, NA 
F2 ->  X.X12i, lam9, NA 
F2 ->  X.X12l, lam10, NA 
F2 ->  X.X12m, lam11, NA 
F2 ->  X.X12n, lam12, NA 
F2 ->  X.X12o, lam13, NA 
F3 ->  X.X12p, lam14, NA 
F3 ->  X.X12q, lam15, NA 
F3 ->  X.X12r, lam16, NA 
F3 ->  X.X12s, lam17, NA 
F4 ->  X.X1a, lam18, NA 
F4 ->  X.X1b, lam19, NA 
F4 ->  X.X1c, lam20, NA 
F4 ->  X.X1d, lam21, NA 
F4 ->  X.X1e, lam22, NA 
F4 ->  X.X1f, lam23, NA 
F5 ->  X.X3h, lam24, NA 
F5 ->  X.X3i, lam25, NA 
F5 ->  X.X3l, lam26, NA 
F5 ->  X.X3m, lam27, NA 
F5 ->  X.X3n, lam28, NA 
F5 ->  X.X3o, lam29, NA 
F5 ->  X.X3p, lam30, NA 
F5 ->  X.X3q, lam31, NA 
F6 ->  X.X3r, lam32, NA 
F6 ->  X.X3s, lam33, NA 
F6 ->  X.X3t, lam34, NA 
F6 ->  X.X3u, lam35, NA 
F6 ->  X.X3v, lam36, NA 
F6 ->  X.X4a, lam37, NA 
F7 ->  X.X5q, lam38, NA 
F7 ->  X.X5r, lam39, NA
F7 ->  X.X5s, lam40, NA
F8 ->  X.X8a, lam41, NA
F8 ->  X.X8b, lam42, NA
F8 ->  X.X8c, lam43, NA
F8 ->  X.X8d, lam44, NA
X.X12a <-> X.X12a, e1,   NA 
X.X12b <-> X.X12b, e2,   NA 
X.X12c <-> X.X12c, e3,   NA 
X.X12d <-> X.X12d, e4,   NA 
X.X12e <-> X.X12e, e5,   NA 
X.X12f <-> X.X12f, e6,   NA 
X.X12g <-> X.X12g, e7,   NA 
X.X12h <-> X.X12h, e8,   NA 
X.X12i <-> X.X12i, e9,   NA 
X.X12l <-> X.X12l, e10,   NA 
X.X12m <-> X.X12m, e11,   NA 
X.X12n <-> X.X12n, e12,   NA
X.X12o <-> X.X12o, e13,   NA
X.X12p <-> X.X12p, e14,   NA
X.X12q <-> X.X12q, e15,   NA
X.X12r <-> X.X12r, e16,   NA
X.X12s <-> X.X12s, e17,   NA
X.X1a <-> X.X1a, e18,   NA
X.X1b <-> X.X1b, e19,   NA
X.X1c <-> X.X1c, e20,   NA
X.X1d <-> X.X1d, e21,   NA
X.X1e <-> X.X1e, e22,   NA
X.X1f <-> X.X1f, e23,   NA
X.X3h <-> X.X3h, e24,   NA
X.X3i <-> X.X3i, e25,   NA
X.X3l <-> X.X3l, e26,   NA
X.X3m <-> X.X3m, e27,   NA
X.X3n <-> X.X3n, e28,   NA
X.X3o <-> X.X3o, e29,   NA
X.X3p <-> X.X3p, e30,   NA
X.X3q <-> X.X3q, e31,   NA
X.X3r <-> X.X3r, e32,   NA
X.X3s <-> X.X3s, e33,   NA
X.X3t <-> X.X3t, e34,   NA
X.X3u <-> X.X3u, e35,   NA
X.X3v <-> X.X3v, e36,   NA
X.X4a <-> X.X4a, e37,   NA
X.X5q <-> X.X5q, e38,   NA
X.X5r <-> X.X5r, e39,   NA
X.X5s <-> X.X5s, e40,   NA
X.X8a <-> X.X8a, e41,   NA
X.X8b <-> X.X8b, e42,   NA
X.X8c <-> X.X8c, e43,   NA
X.X8d <-> X.X8d, e44,   NA
F1 <-> F1, NA, 1 
F2 <-> F2, NA, 1 
F3 <-> F3, NA, 1 
F4 <-> F4, NA, 1 
F5 <-> F5, NA, 1 
F6 <-> F6, NA, 1 
F7 <-> F7, NA, 1 
F8 <-> F8, NA, 1 
 
mydata.sem <- sem(model.mydata, mydata.cov, nrow(mydata))
# print results (fit indices, paramters, hypothesis tests) 
summary(mydata.sem)
# print standardized coefficients (loadings) 
std.coef(mydata.sem) 

 
Now the problems, and my questions, are various:
1)In "mydata" i need to have only the questions or also my latent variables? In 
other words, i suppose that the mean of  Q1,Q2,Q3 give me a variable called 
"OCB". In mydata i need also this mean???
2)In the specification of my model, i didn't use nothing like "F1<->F2..", 
is this a problem? this sentence what indicates??? that i have a 
mediation/moderation effect between variables???
3)Now, if you look my code,you could see that i don't put in "mydata" the mean 
value called "OCB" (see point 1), and i don't write nothing about the relation 
between F1 and F2, and when i run the sem function i receive these warnings:
 
1: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = 
vars,  :
  S is numerically singular: expect problems
2: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = 
vars,  :
  S is not positive-definite: expect problems
3: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names = 
vars,  :
  Could not compute QR decomposition of Hessian.
Optimization probably did not converge.

and after the summary i receive this error:
 
 coefficient covariances cannot be computed

What i can do for all this
 
Hoping in your interest about this problem, i wish you the best.
 
Costantino Milanese, a young researcher full of problems!
[[

[R] Importing csv file with character values into sqlite3 and subsequent problem in R / RSQLite

2009-03-30 Thread Stephan Lindner
Dear all,


I'm trying to import a csv file into sqlite3 and from there into
R. Everything looks fine exepct that R outputs the character values in
an odd fashion: they are shown as "\"CHARACTER\"" instead of
"CHARACTER", but only if I show the character variable as a
vector. Does someone know why this happens? Below is a sample
code. The first part is written in bash. Of course I could just
read.csv for the spreadsheet, but the real datasets are more than 3
GB, that's why I'm using RSQLite (which is really awesome!). Also, I
could get rid of the "" in the csv file (the csv file has only
numbers, but it is easier for my to use identifiers such as v1 as
character strings), but I thought I'd first see whether there is a
different way to solve this issue.


Thanks! 


Stephan


<-- 

bash$ more example.csv
bash$ echo -e 
"\"001074034\",90,1,7,89,12\n\"001074034\",90,1,1,90,12\n\"001074034\",90,1,2,90,12\n\"001074034\",90,1,3,90,12"
 > example.csv
bash$ echo "create table t(v1,v2,v3,v4,v5,v6);" > example.sql
bash$ sqlite3 example.db < example.sql
bash$ echo -e ".separator , \n.import example.csv t" | sqlite3 example.db
bash$ R
> library(RSQLite)
Loading required package: DBI
> example.db <- dbConnect(SQLite(),"example.db")
> x <- dbGetQuery(example.db,"select * from t")
> x
   v1 v2 v3 v4 v5 v6
1 "001074034" 90  1  7 89 12
2 "001074034" 90  1  1 90 12
3 "001074034" 90  1  2 90 12
4 "001074034" 90  1  3 90 12

> x$v1
 [1] "\"001074034\"" "\"001074034\"" "\"001074034\"" "\"001074034\""

-->


Only the codes: 


<-- 

more example.csv
echo -e 
"\"001074034\",90,1,7,89,12\n\"001074034\",90,1,1,90,12\n\"001074034\",90,1,2,90,12\n\"001074034\",90,1,3,90,12"
 > example.csv
echo "create table t(v1,v2,v3,v4,v5,v6);" > example.sql
sqlite3 example.db < example.sql
echo -e ".separator , \n.import example.csv t" | sqlite3 example.db
R

library(RSQLite)
example.db <- dbConnect(SQLite(),"example.db")
x <- dbGetQuery(example.db,"select * from t")
x
x$v1

--> 




-- 
---
Stephan Lindner
University of Michigan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pgmm (Blundell-Bond) sample needed)

2009-03-30 Thread Millo Giovanni
Dear Ivo, dear list,

(see: Message: 70
Date: Thu, 26 Mar 2009 21:39:19 +
From: ivo...@gmail.com
Subject: [R] pgmm (Blundell-Bond) sample needed)

I think I finally figured out how to replicate your supersimple GMM
example with pgmm() so as to get the very same results as Stata.
Having no other regressors in the formula initially drove me crazy. This was a 
case where simpler models are trickier than more
complicated ones!

For the benefit of other GMM people on this list, here's a brief résumé
of our rather long private mail exchange of these days, answering to
some other pgmm()-related posts which have appeared on this list
lately. Sorry for the overlong posting but it might be worth the space.

I will refer to the very good Stata tutorial by David Roodman that Ivo
himself pointed me to, which gives a nice
(and free) theoretical intro as well. Please (the others) find it
here: http://repec.org/nasug2006/howtodoxtabond2.cgdev.pdf. As far as
textbooks are concerned, Arellano's
panel data book (Oxford) is the theoretical reference I would
suggest. 

There have been two separate issues: 
- syntax (how to get the right model)
- small sample behaviour (minimal time dimension to get estimates)

I'll start with this last one, then provide a quick "Rosetta stone" of
pgmm() and Stata commands producing the same results. The established
benchmarks for dynamic panels' GMM are the DPD routines written by Arellano et
al. for Gauss and later Ox, but  Stata is proven to give the same
results, and it is the established general reference for panel
data. Lastly I will add the usual examples found in the literature,
although they are very close relatives of 'example(pgmm)', so as to
show the correspondence between the models.

1) Small samples and N-asymptotics:
GMM needs big N, small T. Else you end up having more instruments than
observations and you get a "singular matrix" error (which, as Ivo
correctly found out, happens in the computation of the optimal
weights' matrix). While this is
probably going to be substituted with a more descriptive error
message, it still explains you the heart of the matter. 
Yet Stata
gives you estimates in this case as well: as I suspected, it is
because it uses a generalized inverse (see Roodman's tutorial,
2.6). This looks theoretically ok. Whether this is meaningful in
applied practice is an issue I will discuss with the package
maintainer. IMHO it is not, apart maybe for illustrative purposes, and
it might well encourage bad habits (see the discussion about (not)
fitting the Grunfeld model by GMM on this list, some weeks ago).

2) fitting the simple models
Simplest possible model: AR(1) with individual effects
  x(i,t)= a*(x(i,t-1)) + bi + c

This is what Ivo asked for in the first place. As the usual example is on data 
from the Arellano and Bond paper,
available in package 'plm' as
 
> data(EmplUK)

I'll use log(emp) from this dataset as 'x', for ease of reproducibility. Same 
data are
available in Stata by 'use
"http://www.stata-press.com/data/r7/abdata.dta";'. The Stata dataset is
identical but for the variable names and the fact that in Stata you
have to generate logs beforehand (ugh!). I'm also adding the
'nomata' option to avoid complications, but this will be unnecessary on most
systems (not on mine...).

The system-GMM estimator (with robust SEs) in Stata is 'xtabond2 n
nL1, gmm(L.(n)) nomata robust' whose R equivalent is:

> sysmod<-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, 
> gmm.inst=~log(emp), lag.gmm=c(2,99),  
 + effect="individual", model="onestep", transformation="ld" )
> summary(sysmod, robust=TRUE)

(note that although 'summary(sysmod)' does not report a constant, it's
actually there; this is an issue to be checked).

while the difference-GMM is 'xtabond2 n nL1, gmm(L.(n)) noleveleq
nomata robust', in R:

> diffmod<-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, 
> gmm.inst=~log(emp), lag.gmm=c(2,99),  
+  effect="individual", model="onestep", transformation="d" )
> summary(diffmod,robust=TRUE)

The particular model Ivo asked for, using only lags 2-4 as
instruments, is 'xtabond2 x lx, gmm(L.(x),lag(1 3)) robust' in Stata
and only requires to set 'lag.gmm=c(2,4)' in the 'sysmod' above
(notice the difference in the lags specification!).

Note also that, unlike Ivo, I am using robust covariances.

3) fitting the standard examples from the literature.

'example(pgmm)' is a somewhat simplified version of the standard
Arellano-Bond example. For better comparability, here I am replicating
the results from the abest.do Stata script from
http://ideas.repec.org/c/boc/bocode/s435901.html (i.e., the results of
the Arellano and Bond paper done via xtabond2). The same output is also to
be found in Roodman's tutorial, 3.3. 

Here's how to replicate the output of abest.do:
(must execute the preceding lines in the file as well for data transf.)
 
* Replicate difference GMM runs in Arellano and Bond 1991, Table 4
* Column (a1)
xtabond2 n L(0/1).(l.n w) l(0/2).(

[R] 64 bit compiled version of R on windows

2009-03-30 Thread Vadlamani, Satish {FLNA}
Hi:
1) Does anyone have experience with 64 bit compiled version of R on windows? Is 
this available or one has to compile it oneself?
2) If we do compile the source in 64 bit, would we then need to compile any 
additional modules also in 64 bit?

I am just trying to prepare for the time when I will get larger datasets to 
analyze. Each of the datasets is about 1 GB in size and I will try to bring in 
about 16 of them in memory at the same time. At least that is the plan.

I asked a related question in the past and someone recommended the product 
RevolutionR - I am looking into this also. If you can think of any other 
options, please mention. I have not been doing low level programming for a 
while now and therefore, the self compilation on windows would be the least 
preferable (and then I have to worry about how to compile any modules that I 
need). Thanks.

Thanks.
Satish

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to input multiple .txt files

2009-03-30 Thread hadley wickham
On Mon, Mar 30, 2009 at 10:33 AM, Mike Lawrence  wrote:
> To repent for my sins, I'll also suggest that Hadley Wickham's "plyr"
> package (http://had.co.nz/plyr/) is also useful/parsimonious in this
> context:
>
> a <- ldply(cust1_files,read.table)

You might also want to do

names(cust1_files) <- basename(cust1_files)

so that you can easily see where each part of the data came from
(although I think this will only work in the next version of plyr)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sliding window over irregular intervals

2009-03-30 Thread Michael Lawrence
On Mon, Mar 30, 2009 at 6:01 AM, Irene Gallego Romero wrote:

> Dear all,
>
> I have some very big data files that look something like this:
>
> id chr pos ihh1 ihh2 xpehh
> rs5748748 22 15795572 0.0230222 0.0268394 -0.153413
> rs5748755 22 15806401 0.0186084 0.0268672 -0.367296
> rs2385785 22 15807037 0.0198204 0.0186616 0.0602451
> rs1981707 22 15809384 0.0299685 0.0176768 0.527892
> rs1981708 22 15809434 0.0305465 0.0187227 0.489512
> rs11914222 22 15810040 0.0307183 0.0172399 0.577633
> rs4819923 22 15813210 0.02707 0.0159736 0.527491
> rs5994105 22 15813888 0.025202 0.0141296 0.578651
> rs5748760 22 15814084 0.0242894 0.0146486 0.505691
> rs2385786 22 15816846 0.0173057 0.0107816 0.473199
> rs1990483 22 15817310 0.0176641 0.0130525 0.302555
> rs5994110 22 15821524 0.0178411 0.0129001 0.324267
> rs17733785 22 15822154 0.0201797 0.0182093 0.102746
> rs7287116 22 15823131 0.0201993 0.0179028 0.12069
> rs5748765 22 15825502 0.0193195 0.0176513 0.090302
>
> I'm trying to extract the maximum and minimum xpehh (last column) values
> within a sliding window (non overlapping), of width 1 (calculated
> relative to pos (third column)). However, as you can tell from the brief
> excerpt here, although all possible intervals will probably be covered by at
> least one data point, the number of data points will be variable
> (incidentally, if anyone knows of a way to obtain this number, that would be
> lovely), as will the spacing between them. Furthermore, values of chr
> (second column) will range from 1 to 22, and values of pos will be
> overlapping across them; I want to evaluate the window separately for each
> value of chr.
>

The IRanges package from the Bioconductor project attempts to solve problems
like these. For example, to count the number of overlapping intervals at a
given position in the chromosome, you would use the coverage() function. The
RangedData class is designed to store data like yours and rdapply() makes it
easy to perform operations one chromosome at a time.

That said, I don't think it has any easy way to solve your problem of
calculating quantiles. That's a feature that needs to be added to the
package. I could imagine something like (with the development version),
calling disjointBins() to separate the ranges in bins where there is no
overlap, then converting each bin into an Rle, and then using pmin/max on
the Rle objects in series to get your answer.

Anyway, you probably want to check out IRanges.

Michael


>
> I've looked at the help and FAQ on sliding windows, but I'm a relative
> newcomer to R and cannot find a way to do what I need to do. Everything I've
> managed to unearth so far seems geared towards smoother time series. Any
> help on this problem would be vastly appreciated.
>
> Thanks,
> Irene
>
> --
> Irene Gallego Romero
> Leverhulme Centre for Human Evolutionary Studies
> University of Cambridge
> Fitzwilliam St
> Cambridge
> CB2 1QH
> UK
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HELP WITH SEM LIBRARY AND WITH THE MODEL'S SPECIFICATION

2009-03-30 Thread John Fox
Dear Costantino,

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
> Behalf Of Analisi Dati
> Sent: March-30-09 11:13 AM
> To: r-help@r-project.org
> Subject: [R] HELP WITH SEM LIBRARY AND WITH THE MODEL'S SPECIFICATION
> 
> Dear users,
> i'm using the sem package in R, because i need to improve a confermative
> factor analisys.
> I have so many questions in my survey, and i suppose, for example,  that
> Question 1 (Q1) Q2 and Q3 explain the same thing (factor F1), Q4,Q5 and Q6
> explain F2 and Q7 and Q8 explain F3...
> For check that what i supposed is true, i run this code to see if the
values
> of loadings are big or not.
> (In this code i used more than 3 factors)
> 

. . . (many lines elided)

> 
> 
> Now the problems, and my questions, are various:
> 1)In "mydata" i need to have only the questions or also my latent
variables?
> In other words, i suppose that the mean of  Q1,Q2,Q3 give me a variable
> called "OCB". In mydata i need also this mean???

No. sem() recognizes as latent variables (F1, F2, etc.) those variables that
do not appear in the observed-variable covariance matrix. There are several
examples in ?sem that illustrate this point. Moreover, the latent variables
are not in general simply means of observed variables.

> 2)In the specification of my model, i didn't use nothing like "F1<-
> >F2..", is this a problem? this sentence what indicates??? that i have
a
> mediation/moderation effect between variables???

By not specifying F1 <-> F2, you imply that the factors F1 and F2 are
uncorrelated. This isn't illogical, but it produces a very restrictive
model. Conversely, specifying F1 <-> F2 causes the covariance of F1 and F2
to be estimated; because you set the variances of the factors to 1, this
covariance would be the factor correlation.

> 3)Now, if you look my code,you could see that i don't put in "mydata" the
> mean value called "OCB" (see point 1), and i don't write nothing about the
> relation between F1 and F2, and when i run the sem function i receive
these
> warnings:
> 
> 1: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names =
> vars,  :
>   S is numerically singular: expect problems
> 2: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names =
> vars,  :

That seems to me a reasonably informative error message: The
observed-variable covariance matrix is singular. This could happen, e.g., if
two observed variables are perfectly correlated, if an observed variable had
0 variance, or if there were more observed variables than observations.

>   S is not positive-definite: expect problems
> 3: In sem.default(ram = ram, S = S, N = N, param.names = pars, var.names =
> vars,  :

That S is singular implies that it is not positive-definite, but because a
non-singular matrix need not be positive-definite, sem() checks for both.

>   Could not compute QR decomposition of Hessian.
> Optimization probably did not converge.
> 
> and after the summary i receive this error:
> 
>  coefficient covariances cannot be computed

These are the problems that sem() told you to expect.

> 
> What i can do for all this

Without more information, it's not possible to know. You should figure out
why the observed-variable covariance matrix is singular.

I hope this helps,
 John

> 
> Hoping in your interest about this problem, i wish you the best.
> 
> Costantino Milanese, a young researcher full of problems!
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2-geom_text()

2009-03-30 Thread Felipe Carrillo

Hi: I need help with geom_text().
 I would like to count the number of Locations
 and put the sum of it right above each bar.

x <- "Location Lake_dens Fish Pred
Lake1   1.132   1   0.115
Lake1   0.627   1   0.148
Lake1   1.324   1   0.104
Lake1   1.265   1   0.107
Lake2   1.074   0   0.096
Lake2   0.851   0   0.108
Lake2   1.098   0   0.095
Lake2   0.418   0   0.135
Lake2   1.256   1   0.088
Lake2   0.554   1   0.126
Lake2   1.247   1   0.088
Lake2   0.794   1   0.112
Lake2   0.181   0   0.152
Lake3   1.694   0   0.001
Lake3   1.018   0   0.001
Lake3   2.880   0"
DF <- read.table(textConnection(x), header = TRUE)
 p <- ggplot(DF,aes(x=Location)) + geom_bar()
 p + geom_text(aes(y=Location),label=sum(count)) # Error because count doesn't 
exist in dataset

 What should I use instead of 'count' to be able to sum the number
 of Locations?

Felipe D. Carrillo  
Supervisory Fishery Biologist  
Department of the Interior  
US Fish & Wildlife Service  
California, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 64 bit compiled version of R on windows

2009-03-30 Thread Duncan Murdoch

On 3/30/2009 12:46 PM, Vadlamani, Satish {FLNA} wrote:

Hi:
1) Does anyone have experience with 64 bit compiled version of R on windows? Is 
this available or one has to compile it oneself?
2) If we do compile the source in 64 bit, would we then need to compile any 
additional modules also in 64 bit?


R for Windows is compiled using the MinGW port of gcc, and the 64 bit 
version of that compiler is not really ready for general use yet, so 
compiling for 64 bits is not completely straightforward.  Revolution 
Computing has announced on the R-devel list that they are beta testing a 
build, with some information at


http://www.revolution-computing.com/products/windows-64bit.php

The page says it is scheduled for release at the end of March, so there 
should be something available soon.


Duncan Murdoch



I am just trying to prepare for the time when I will get larger datasets to 
analyze. Each of the datasets is about 1 GB in size and I will try to bring in 
about 16 of them in memory at the same time. At least that is the plan.

I asked a related question in the past and someone recommended the product 
RevolutionR - I am looking into this also. If you can think of any other 
options, please mention. I have not been doing low level programming for a 
while now and therefore, the self compilation on windows would be the least 
preferable (and then I have to worry about how to compile any modules that I 
need). Thanks.

Thanks.
Satish

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrined dependent optimization.

2009-03-30 Thread Paul Smith
Apparently, the convergence is faster if one uses this new swap function:

swapfun <- function(x,N=100) {
 loc <- c(sample(1:(N/2),size=1,replace=FALSE),sample((N/2):100,1))
 tmp <- x[loc[1]]
 x[loc[1]] <- x[loc[2]]
 x[loc[2]] <- tmp
 x
}

It seems that within 20 millions of iterations, one gets the exact
optimal solution, which does not take too long.

Paul


On Mon, Mar 30, 2009 at 5:11 PM, Paul Smith  wrote:
> Optim with SANN also solves your example:
>
> ---
>
> f <- function(x) sum(c(1:50,50:1)*x)
>
> swapfun <- function(x,N=100) {
>  loc <- sample(N,size=2,replace=FALSE)
>  tmp <- x[loc[1]]
>  x[loc[1]] <- x[loc[2]]
>  x[loc[2]] <- tmp
>  x
> }
>
> N <- 100
>
> opt1 <- 
> optim(fn=f,par=sample(1:N,N),gr=swapfun,method="SANN",control=list(maxit=5,fnscale=-1,trace=10))
> opt1$par
> opt1$value
>
> ---
>
> We need to specify a large number of iterations to get the optimal
> solution. The objective function at the optimum is 170425, and one
> gets a close value with optim and SANN.
>
> Paul
>
>
> On Mon, Mar 30, 2009 at 2:22 PM, Hans W. Borchers
>  wrote:
>>
>> Image you want to minimize the following linear function
>>
>>    f <- function(x) sum( c(1:50, 50:1) * x / (50*51) )
>>
>> on the set of all permutations of the numbers 1,..., 100.
>>
>> I wonder how will you do that with lpSolve? I would simply order
>> the coefficients and then sort the numbers 1,...,100 accordingly.
>>
>> I am also wondering how optim with "SANN" could be applied here.
>>
>> As this is a problem in the area of discrete optimization resp.
>> constraint programming, I propose to use an appropriate program
>> here such as the free software Bprolog. I would be interested to
>> learn what others propose.
>>
>> Of course, if we don't know anything about the function f then
>> it amounts to an exhaustive search on the 100! permutations --
>> probably not a feasible job.
>>
>> Regards,  Hans Werner
>>
>>
>>
>> Paul Smith wrote:
>>>
>>> On Sun, Mar 29, 2009 at 9:45 PM,   wrote:
 I have an optimization question that I was hoping to get some suggestions
 on how best to go about sovling it. I would think there is probably a
 package that addresses this problem.

 This is an ordering optimzation problem. Best to describe it with a
 simple example. Say I have 100 "bins" each with a ball in it numbered
 from 1 to 100. Each bin can only hold one ball. This optimization is that
 I have a function 'f' that this array of bins and returns a number. The
 number returned from f(1,2,3,4) would return a different number from
 that of f(2,1,3,4). The optimization is finding the optimum order of
 these balls so as to produce a minimum value from 'f'.I cannot use the
 regular 'optim' algorithms because a) the values are discrete, and b) the
 values are dependent ie. when the "variable" representing the bin
 location is changed (in this example a new ball is put there) the
 existing ball will need to be moved to another bin (probably swapping
 positions), and c) each "variable" is constrained, in the example above
 the only allowable values are integers from 1-100. So the problem becomes
 finding the optimum order of the "balls".

 Any suggestions?
>>>
>>> If your function f is linear, then you can use lpSolve.
>>>
>>> Paul
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/Constrined-dependent-optimization.-tp22772520p22782922.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing csv file with character values into sqlite3 and subsequent problem in R / RSQLite

2009-03-30 Thread Gabor Grothendieck
There are some examples of reading files into sqlite on the
sqldf home page:

http://sqldf.googlecode.com


On Mon, Mar 30, 2009 at 12:19 PM, Stephan Lindner  wrote:
> Dear all,
>
>
> I'm trying to import a csv file into sqlite3 and from there into
> R. Everything looks fine exepct that R outputs the character values in
> an odd fashion: they are shown as "\"CHARACTER\"" instead of
> "CHARACTER", but only if I show the character variable as a
> vector. Does someone know why this happens? Below is a sample
> code. The first part is written in bash. Of course I could just
> read.csv for the spreadsheet, but the real datasets are more than 3
> GB, that's why I'm using RSQLite (which is really awesome!). Also, I
> could get rid of the "" in the csv file (the csv file has only
> numbers, but it is easier for my to use identifiers such as v1 as
> character strings), but I thought I'd first see whether there is a
> different way to solve this issue.
>
>
> Thanks!
>
>
>        Stephan
>
>
> <--
>
> bash$ more example.csv
> bash$ echo -e 
> "\"001074034\",90,1,7,89,12\n\"001074034\",90,1,1,90,12\n\"001074034\",90,1,2,90,12\n\"001074034\",90,1,3,90,12"
>  > example.csv
> bash$ echo "create table t(v1,v2,v3,v4,v5,v6);" > example.sql
> bash$ sqlite3 example.db < example.sql
> bash$ echo -e ".separator , \n.import example.csv t" | sqlite3 example.db
> bash$ R
>> library(RSQLite)
> Loading required package: DBI
>> example.db <- dbConnect(SQLite(),"example.db")
>> x <- dbGetQuery(example.db,"select * from t")
>> x
>           v1 v2 v3 v4 v5 v6
> 1 "001074034" 90  1  7 89 12
> 2 "001074034" 90  1  1 90 12
> 3 "001074034" 90  1  2 90 12
> 4 "001074034" 90  1  3 90 12
>
>> x$v1
>  [1] "\"001074034\"" "\"001074034\"" "\"001074034\"" "\"001074034\""
>
> -->
>
>
> Only the codes:
>
>
> <--
>
> more example.csv
> echo -e 
> "\"001074034\",90,1,7,89,12\n\"001074034\",90,1,1,90,12\n\"001074034\",90,1,2,90,12\n\"001074034\",90,1,3,90,12"
>  > example.csv
> echo "create table t(v1,v2,v3,v4,v5,v6);" > example.sql
> sqlite3 example.db < example.sql
> echo -e ".separator , \n.import example.csv t" | sqlite3 example.db
> R
>
> library(RSQLite)
> example.db <- dbConnect(SQLite(),"example.db")
> x <- dbGetQuery(example.db,"select * from t")
> x
> x$v1
>
> -->
>
>
>
>
> --
> ---
> Stephan Lindner
> University of Michigan
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mature SOAP Interface for R

2009-03-30 Thread Tobias Verbeke

Michael Lawrence wrote:

On Sat, Mar 28, 2009 at 6:08 PM, zubin  wrote:


Hello, we are writing rich internet user interfaces and like to call R for
some of the computational needs on the data, as well as some creation of
image files.  Our objects communicate via the SOAP interface.  We have been
researching the various packages to expose R as a SOAP service.

No current CRAN SOAP packages however.

Found 3 to date:

RSOAP (http://sourceforge.net/projects/rsoap/)
SSOAP http://www.omegahat.org/SSOAP/

looks like a commercial version?
http://random-technologies-llc.com/products/rsoap

Does anyone have experience with these 3 and can recommend the most
'mature' R - SOAP interface package?



Well, SSOAP is (the last time I checked) just a SOAP client. rsoap (if we're
talking about the same package) is actually a python SOAP server that
communicates to R via rpy.

You might want to check out the RWebServices package in Bioconductor. I
think it uses Java for its SOAP handling.


Connecting to R as a server via SOAP is one of the
many ways the biocep project

http://www.biocep.net

allows one to make use of R in statistical application
development (there is also RESTful web services,
connections over RMI, etc.).

HTH,
Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] circular analysis

2009-03-30 Thread Blanka Vlasakova
Hi,
I am looking for a way to analyze a dataset with a circular dependent
variable and three independent factors. To be specific, the circular
variable comprises of arrival times of pollinators to flowers. The
independent variables are pollinator species, flower sex and locality. I
have failed to find a way how to include all three factors. The "circular"
package seems to enable testing of a single factor - or am I wrong?
Does the "circular" or any other package enables to perform such analysis?
Many thanks
Blanka Vlasakova

-- 
Department of Botany
Charles University in Prague
Benatska 2
128 01  Prague 2
CZECH REPUBLIC

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating First Occurance by a factor

2009-03-30 Thread jwg20

I'm having difficulty finding a solution to my problem that without using a
for loop. For the amount of data I (will) have, the for loop will probably
be too slow. I tried searching around before posting and couldn't find
anything, hopefully it's not embarrassingly easy.  

Consider the data.frame, Data,  below

Data
Sub Tr  IA   FixInx  FixTime
p1   t1  11200
p1   t1  22350
p1   t1  23500
p1   t1  34600
p1   t1  35700
p1   t1  46850
p1   t1  371200
p1   t1  581350
p1   t1  591500

What I'm trying to do is for each unique IA get the first occurring FixTime.
This will eventually need to be done by each Trial (Tr) and each Subject
Number (Sub). FixInx is essentially the number of rows in a trial. The
resulting data.frame is below.

Sub Tr  IA  FirstFixTime
p1   t1  1   200
p1   t1  2   350
p1   t1  3   600
p1   t1  4   850
p1   t1  5   1350

Here is the solution I have now.  

agg = aggregate(data$FixInx, list(data$Sub, data$Tr, data$IA), min) #get the
minimum fix index by Sub, Tr, and IA... I can use this min fix index to pull
out the desired fixtime

agg$firstfixtime = 0 # new column for results

for (rown in 1:length(rownames(agg))){ #cycle through rows and get each
data$firstfixtime from FixTime in matching rows 
  agg$firstfixtime[rown] = as.character(data[data$Tr == agg$Group.2[rown] &
data$Sub == agg$Group.1[rown] & data$IA == agg$Group.3[rown] & data$FixInx
== agg$x[rown], ]$FixTime)
}
-- 
View this message in context: 
http://www.nabble.com/Calculating-First-Occurance-by-a-factor-tp22789964p22789964.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating First Occurance by a factor

2009-03-30 Thread Dimitris Rizopoulos

one way is:

ind <- ave(Data$IA, Data$Sub, Data$Tr, FUN = function (x) !duplicated(x))
Data[as.logical(ind), ]


I hope it helps.

Best,
Dimitris


jwg20 wrote:

I'm having difficulty finding a solution to my problem that without using a
for loop. For the amount of data I (will) have, the for loop will probably
be too slow. I tried searching around before posting and couldn't find
anything, hopefully it's not embarrassingly easy.  


Consider the data.frame, Data,  below

Data
Sub Tr  IA   FixInx  FixTime
p1   t1  11200
p1   t1  22350
p1   t1  23500
p1   t1  34600
p1   t1  35700
p1   t1  46850
p1   t1  371200
p1   t1  581350
p1   t1  591500

What I'm trying to do is for each unique IA get the first occurring FixTime.
This will eventually need to be done by each Trial (Tr) and each Subject
Number (Sub). FixInx is essentially the number of rows in a trial. The
resulting data.frame is below.

Sub Tr  IA  FirstFixTime
p1   t1  1   200
p1   t1  2   350
p1   t1  3   600
p1   t1  4   850
p1   t1  5   1350

Here is the solution I have now.  


agg = aggregate(data$FixInx, list(data$Sub, data$Tr, data$IA), min) #get the
minimum fix index by Sub, Tr, and IA... I can use this min fix index to pull
out the desired fixtime

agg$firstfixtime = 0 # new column for results

for (rown in 1:length(rownames(agg))){ #cycle through rows and get each
data$firstfixtime from FixTime in matching rows 
  agg$firstfixtime[rown] = as.character(data[data$Tr == agg$Group.2[rown] &

data$Sub == agg$Group.1[rown] & data$IA == agg$Group.3[rown] & data$FixInx
== agg$x[rown], ]$FixTime)
}


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating First Occurance by a factor

2009-03-30 Thread Mike Lawrence
I discovered Hadley Wickham's "plyr" package last week and have found
it very useful in circumstances like this:

library(plyr)

firstfixtime = ddply(
   .data = data
   , .variables = c('Sub','Tr','IA')
   , .fun <- function(df){
   df$FixTime[which.min(df$FixInx)]
   }
)

> On Mon, Mar 30, 2009 at 3:40 PM, jwg20  wrote:
>>
>> I'm having difficulty finding a solution to my problem that without using a
>> for loop. For the amount of data I (will) have, the for loop will probably
>> be too slow. I tried searching around before posting and couldn't find
>> anything, hopefully it's not embarrassingly easy.
>>
>> Consider the data.frame, Data,  below
>>
>> Data
>> Sub Tr  IA   FixInx  FixTime
>> p1   t1  1    1        200
>> p1   t1  2    2        350
>> p1   t1  2    3        500
>> p1   t1  3    4        600
>> p1   t1  3    5        700
>> p1   t1  4    6        850
>> p1   t1  3    7        1200
>> p1   t1  5    8        1350
>> p1   t1  5    9        1500
>>
>> What I'm trying to do is for each unique IA get the first occurring FixTime.
>> This will eventually need to be done by each Trial (Tr) and each Subject
>> Number (Sub). FixInx is essentially the number of rows in a trial. The
>> resulting data.frame is below.
>>
>> Sub Tr  IA  FirstFixTime
>> p1   t1  1   200
>> p1   t1  2   350
>> p1   t1  3   600
>> p1   t1  4   850
>> p1   t1  5   1350
>>
>> Here is the solution I have now.
>>
>> agg = aggregate(data$FixInx, list(data$Sub, data$Tr, data$IA), min) #get the
>> minimum fix index by Sub, Tr, and IA... I can use this min fix index to pull
>> out the desired fixtime
>>
>> agg$firstfixtime = 0 # new column for results
>>
>> for (rown in 1:length(rownames(agg))){ #cycle through rows and get each
>> data$firstfixtime from FixTime in matching rows
>>  agg$firstfixtime[rown] = as.character(data[data$Tr == agg$Group.2[rown] &
>> data$Sub == agg$Group.1[rown] & data$IA == agg$Group.3[rown] & data$FixInx
>> == agg$x[rown], ]$FixTime)
>> }
>> --
>> View this message in context: 
>> http://www.nabble.com/Calculating-First-Occurance-by-a-factor-tp22789964p22789964.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>



-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tinyurl.com/mikes-public-calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating First Occurance by a factor

2009-03-30 Thread Jason Gullifer
Thank you Mike and Dimitris for your replies.

I was able to get Mike's command to work and it does what I want (and fast
too!) I hadn't looked into the plyr package at all, but I have seen it load
when loading the reshape package. (Another useful package for manipulating
data frames!)

Thanks again.
-Jason

On Mon, Mar 30, 2009 at 3:58 PM, Mike Lawrence  wrote:

> I discovered Hadley Wickham's "plyr" package last week and have found
> it very useful in circumstances like this:
>
> library(plyr)
>
> firstfixtime = ddply(
>   .data = data
>   , .variables = c('Sub','Tr','IA')
>   , .fun <- function(df){
>   df$FixTime[which.min(df$FixInx)]
>   }
> )
>
> > On Mon, Mar 30, 2009 at 3:40 PM, jwg20  wrote:
> >>
> >> I'm having difficulty finding a solution to my problem that without
> using a
> >> for loop. For the amount of data I (will) have, the for loop will
> probably
> >> be too slow. I tried searching around before posting and couldn't find
> >> anything, hopefully it's not embarrassingly easy.
> >>
> >> Consider the data.frame, Data,  below
> >>
> >> Data
> >> Sub Tr  IA   FixInx  FixTime
> >> p1   t1  11200
> >> p1   t1  22350
> >> p1   t1  23500
> >> p1   t1  34600
> >> p1   t1  35700
> >> p1   t1  46850
> >> p1   t1  371200
> >> p1   t1  581350
> >> p1   t1  591500
> >>
> >> What I'm trying to do is for each unique IA get the first occurring
> FixTime.
> >> This will eventually need to be done by each Trial (Tr) and each Subject
> >> Number (Sub). FixInx is essentially the number of rows in a trial. The
> >> resulting data.frame is below.
> >>
> >> Sub Tr  IA  FirstFixTime
> >> p1   t1  1   200
> >> p1   t1  2   350
> >> p1   t1  3   600
> >> p1   t1  4   850
> >> p1   t1  5   1350
> >>
> >> Here is the solution I have now.
> >>
> >> agg = aggregate(data$FixInx, list(data$Sub, data$Tr, data$IA), min) #get
> the
> >> minimum fix index by Sub, Tr, and IA... I can use this min fix index to
> pull
> >> out the desired fixtime
> >>
> >> agg$firstfixtime = 0 # new column for results
> >>
> >> for (rown in 1:length(rownames(agg))){ #cycle through rows and get each
> >> data$firstfixtime from FixTime in matching rows
> >>  agg$firstfixtime[rown] = as.character(data[data$Tr == agg$Group.2[rown]
> &
> >> data$Sub == agg$Group.1[rown] & data$IA == agg$Group.3[rown] &
> data$FixInx
> >> == agg$x[rown], ]$FixTime)
> >> }
> >> --
> >> View this message in context:
> http://www.nabble.com/Calculating-First-Occurance-by-a-factor-tp22789964p22789964.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
>
>
> --
> Mike Lawrence
> Graduate Student
> Department of Psychology
> Dalhousie University
>
> Looking to arrange a meeting? Check my public calendar:
> http://tinyurl.com/mikes-public-calendar
>
> ~ Certainty is folly... I think. ~
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix max by row

2009-03-30 Thread Wacek Kusnierczyk
Bert Gunter wrote:
>  
> Serves me right, I suppose. Timing seems also very dependent on the
> dimensions of the matrix. Here's what I got with my inadequate test:
>
>   
>> x <- matrix(rnorm(3e5),ncol=3)
>> 
> ## via apply
>   
>> system.time(apply(x,1,max))
>> 
>user  system elapsed 
>2.090.022.10
>
> ## via pmax 
>   
>> system.time(do.call(pmax,data.frame(x)))
>> 
>user  system elapsed 
>0.100.020.11 
>   
>
>   

yes, similar to what i got.  but with the transpose, the ratio is way
more than inverted:

waku = expression(matrix(apply(m, 1, max), nrow(m)))
bert = expression(do.call(pmax, data.frame(m)))

library(rbenchmark)

m = matrix(rnorm(1e6), ncol=10)
benchmark(replications=10, columns=c('test', 'elapsed'),
order='elapsed',
   waku=waku,
   bert=bert)
#   test elapsed
# 2 bert   1.633
# 1 waku   9.974

m = t(m)
benchmark(replications=10, columns=c('test', 'elapsed'),
order='elapsed',
   waku=waku,
   bert=bert)
#   test elapsed
# 1 waku   0.507
# 2 bert  27.261
  

> Draw your own conclusions!
>   

my favourite:  you should have specified what 'large matrices' means.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cmprsk- another survival-depedent package causes R crash

2009-03-30 Thread Terry Therneau
> As our package developers discussed about incompatibility between Design and 
survival packages,  I faced another problem with cmprsk- a survival dependent 
packacge.
> The problem is exactly similar to what happened to the Design package that 
when I just started running cuminc function, R was suddenly closed.
> These incidents suggest that maybe many other survival dependent packages 
being involved the problem

  I don't see how this is related to survival.  I just checked the source code 
to the cmprsk function, and it has no dependencies on my library.  As I would 
expect, the cmprks function works as expected on our machines.
  Could you send a reproducable example?
  
Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating First Occurance by a factor

2009-03-30 Thread hadley wickham
On Mon, Mar 30, 2009 at 2:58 PM, Mike Lawrence  wrote:
> I discovered Hadley Wickham's "plyr" package last week and have found
> it very useful in circumstances like this:
>
> library(plyr)
>
> firstfixtime = ddply(
>       .data = data
>       , .variables = c('Sub','Tr','IA')
>       , .fun <- function(df){
>               df$FixTime[which.min(df$FixInx)]
>       }
> )

Or to save a little typing:

ddply(data, .(Sub, Tr, IA), colwise(min, .(FixTime))

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cmprsk- another survival-depedent package causes R crash

2009-03-30 Thread Nguyen Dinh Nguyen
Dear Terry,
 When I hit cumic function (my saved command, it used to work previously), R
was suddenly shut down. Therefore, there is no error message. This happened
not only on my PC (window, service pack 3) but also on others from my
colleagues.
Regards
Nguyen Nguyen

-Original Message-
From: Terry Therneau [mailto:thern...@mayo.edu] 
Sent: Tuesday, 31 March 2009 7:32 AM
To: Nguyen Dinh Nguyen
Cc: tlum...@u.washington.edu; r-help@r-project.org
Subject: Re: [R] cmprsk- another survival-depedent package causes R crash

> As our package developers discussed about incompatibility between Design
and 
survival packages,  I faced another problem with cmprsk- a survival
dependent 
packacge.
> The problem is exactly similar to what happened to the Design package that

when I just started running cuminc function, R was suddenly closed.
> These incidents suggest that maybe many other survival dependent packages 
being involved the problem

  I don't see how this is related to survival.  I just checked the source
code 
to the cmprsk function, and it has no dependencies on my library.  As I
would 
expect, the cmprks function works as expected on our machines.
  Could you send a reproducable example?
  
Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2-geom_text()

2009-03-30 Thread Felipe Carrillo

Thanks Paul, I tried to use ..count.. once but it didn't work. What I realized 
I was missing 'stat="bin"'. Thanks for your help.


--- On Mon, 3/30/09, Paul Murrell  wrote:

> From: Paul Murrell 
> Subject: Re: [R] ggplot2-geom_text()
> To: mazatlanmex...@yahoo.com
> Cc: r-h...@stat.math.ethz.ch
> Date: Monday, March 30, 2009, 2:46 PM
> Hi
> 
> 
> Felipe Carrillo wrote:
> > Hi: I need help with geom_text().
> >  I would like to count the number of Locations
> >  and put the sum of it right above each bar.
> > 
> > x <- "Location Lake_dens Fish Pred
> > Lake1   1.132   1   0.115
> > Lake1   0.627   1   0.148
> > Lake1   1.324   1   0.104
> > Lake1   1.265   1   0.107
> > Lake2   1.074   0   0.096
> > Lake2   0.851   0   0.108
> > Lake2   1.098   0   0.095
> > Lake2   0.418   0   0.135
> > Lake2   1.256   1   0.088
> > Lake2   0.554   1   0.126
> > Lake2   1.247   1   0.088
> > Lake2   0.794   1   0.112
> > Lake2   0.181   0   0.152
> > Lake3   1.694   0   0.001
> > Lake3   1.018   0   0.001
> > Lake3   2.880   0"
> > DF <- read.table(textConnection(x), header = TRUE)
> >  p <- ggplot(DF,aes(x=Location)) + geom_bar()
> >  p + geom_text(aes(y=Location),label=sum(count)) #
> Error because count doesn't exist in dataset
> > 
> >  What should I use instead of 'count' to be
> able to sum the number
> >  of Locations?
> 
> 
> How about ... ?
> 
>  p + geom_text(aes(label=..count..), stat="bin",
>vjust=1, colour="white")
> 
> Paul
> 
> 
> > Felipe D. Carrillo  
> > Supervisory Fishery Biologist  
> > Department of the Interior  
> > US Fish & Wildlife Service  
> > California, USA
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> 
> -- 
> Dr Paul Murrell
> Department of Statistics
> The University of Auckland
> Private Bag 92019
> Auckland
> New Zealand
> 64 9 3737599 x85392
> p...@stat.auckland.ac.nz
> http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kruskal-Wallis-test: Posthoc-test?

2009-03-30 Thread Rabea Sutter

Hello.

We have some questions concerning the statistical analysis of a dataset.
We aim to compare the sample means of more than 2 independent samples; the
sample sizes are unbalanced. The requirements of normality distribution and
variance homogeneity were not met even after transforming the data. Thus we
applied a nonparametric test: the Kruskal-Wallis-test (H-Test). The null
hypothesis was rejected. 
Now we try to find a suitable posthoc-test in order to find out which sample
means actually are statistically different.

1. We think that the Behrens-Fisher-test and multiple steel test are not
applicable, because they assume normality distribution as far as we know. Is
that right?
2. Statistical literature suggested to do a Nemenyi-test as posthoc-test.
But this test in general requires balanced sample sizes; so we need a
special type of this test. Is it possible to do such a test in R?
3. We could also test all the samples against each other with a
nonparamatric Mann-Whitney-U-test and correct for the multiple comparisons
(m = 11) according to Bonferroni. Is this testing method allowed?

We would be very grateful, if anyone could help us. Thank you very much!
Christine Hellmann and Rabea Sutter

-- 
View this message in context: 
http://www.nabble.com/Kruskal-Wallis-test%3A-Posthoc-test--tp22794025p22794025.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] use R Group SFBA April meeting reminder; video of Feb kickoff

2009-03-30 Thread Jim Porzak
Next week Wednesday evening, April 8th, Mike Driscoll will be talking
about "Building Web Dashboards using R"
see: http://www.meetup.com/R-Users/calendar/9718968/ for details & to RSVP.

Also of interest, our member Ron Fredericks has just posted a well
edited video of the February kickoff panel discussion at Predictive
Analytics World "The R and Science of Predictive Analytics: Four Case
Studies in R" with
* Bo Cowgill, Google
* Itamar Rosenn, Facebook
* David Smith, Revolution Computing
* Jim Porzak, The Generations Network
and chaired by Michael Driscoll, Dataspora LLC

see: http://www.lecturemaker.com/2009/02/r-kickoff-video/

Best,
Jim Porzak
TGN.com
San Francisco, CA
www.linkedin.com/in/jimporzak
use R! Group SF: www.meetup.com/R-Users/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Darker markers for symbols in lattice

2009-03-30 Thread Deepayan Sarkar
On Sun, Mar 29, 2009 at 12:35 PM, Naomi B. Robbins
 wrote:
> In lattice, using the command trellis.par.get for superpose.symbol, plot,
> symbol and/or dot.symbol shows that we can specify alpha, cex, col, fill
> (for  superpose.symbol and plot.symbol), font, and pch.  Trial and error
> shows that the font affects letters but not pch=1 or pch=3 (open circles
> and plus signs.) I want to use open circles and plus signs, keep the colors
> and cex  I've specified but make the symbols bolder, much the way a
> higher lwd makes lines bolder.  Does anyone know of a library that
> does that or can anyone think of a workaround to make the markers
> stand out better without making them larger?

?grid::gpar lists 'lex' as a "Multiplier applied to line width", and
that seems to work when supplied as a top-level argument (though not
in the parameter settings):

xyplot(1:10 ~ 1:10, pch = c(1, 3), cex = 2, lex = 3)

I'm not sure if 'lwd' should have the same effect (it does in base graphics).

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with tm assocation analysis and Rgraphviz installation.

2009-03-30 Thread xinrong lei
Help with tm assocation analysis and Rgraphviz installation.

THANK YOU IN ADVANCE



Question 1:



I saved two txt file in C:\textfile

And each txt file contents only one text column, and both have 100 records.

I know term “research” occurs 49 times, so I want to find out which other
words are correlated to this word, and I got tons of association  ‘1’ .

I tried other terms, and no association value is less than 1, which
obviously is wrong.

Could any export tell me where did I do wrong?





My R-code is:



R>my.path<-'C:\\textfile'

R>library(tm)

R>my.corpus <- Corpus(DirSource(my.path), readerControl = list
(reader=readPlain))

R>tdmO <- TermDocMatrix(my.corpus)

R>tdmO

An object of class “TermDocMatrix”

Slot "Data":

2 x 1426 sparse Matrix of class "dgCMatrix"

   [[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]]







1 3 1 12 1 1 1 8 1 1 2 1 9 . 2 2 1 518 1 1 1 2 1 1 2 6 1 1 3 3 2 1 1 4 1 4 3
3 1 11 5 1 7 2 5 4 3 1 1

2 . .  . . . . . . . . . . 3 . . .   6 . . . . . . . . . . . . . . . 3 . . .
. .  1 . 1 . . . . . . .




1 1 2 1 4 1 5 4 4 2 4 6 2 2 . 3 1 2 1 3 1 2 1 4 1 1 3 1 1 1 12 2 1 1 2 1 1 4
1 1 . 3 1 2 1 3 3 1 1 2 2

2 . . . . . . . 3 . . 3 . . 1 . . . . . . . . . . . . . . .  . . . . . . . .
. . 1 . . 1 . . 2 . . . .

 …

R>findAssocs(tdmO,”research”,0.95)

academ access  accompani
accord
ace

 1  1  1
   1
1

achiev acquir   acquisit
 act
activ

 1  1  1
   1
1

activi  adaptadd
addit
adequ

 1  1  1
   1
   1



……







Question2:



I can’t load Rgraphviz in R.

I am using windows XP professional, R 2.8.1

I followed the instruction in this link

http://groups.google.com/group/r-help-archive/browse_thread/thread/413605edc81b3422/b7917083646d9cd2?lnk=gst&q=Rgraphviz#b7917083646d9cd2

and

https://stat.ethz.ch/pipermail/bioconductor/2008-June/022838.html



What I did is

1. Close down any R sessions you have open.

2. Download and install Microsoft Visual C++ 2005 SP1 Redistributable

Package:

http://www.microsoft.com/downloads/details.aspx?familyid=200B2FD9-AE1A-4A14-984D-389C36F85647&displaylang=en

2. Download and install the Graphviz 2.16.1 from the archives:

I also tried 2.18.1, and 2.22.2



3. Check your PATH to see how Graphviz was added: graphvis 2.18 and later
versions will automatically add

C:\Program Files\Graphviz2.16\Bin

to Path.



4. open R and download and install Rgraphviz using:

 R> source("http://bioconductor.org/biocLite.R";
)

 R> biocLite("Rgraphviz")



I got no error before the next step:



R>library(Rgraphviz)

I got this error message:

Error in inDL(x, as.logical(local), as.logical(now), ...) :

  unable to load shared library
'C:/PROGRA~1/R/R-28~1.1/library/Rgraphviz/libs/Rgraphviz.dll':

  LoadLibrary failure:  The specified module could not be found.

Error : .onLoad failed in 'loadNamespace' for 'Rgraphviz'

Error: package/namespace load failed for 'Rgraphviz'


What else shall I do?

Thank you in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Darker markers for symbols in lattice

2009-03-30 Thread Naomi B. Robbins
Many thanks to Deepayan for providing just what I wanted.
I've tried lwd many times and it does not work but lex does
the trick. Thanks also to Paul Murrell for his very simple
suggestion of using lower case o for an open circle since
bold works on letters and to Bert Gunter for suggesting an
overplotting technique to try if nothing easier was suggested.

Naomi

-- 

Naomi B. Robbins

NBR

11 Christine Court

Wayne, NJ 07470

 

Phone: (973) 694-6009

na...@nbr-graphs.com

http://www.nbr-graphs.com

Author of /Creating More Effective Graphs 
/




Deepayan Sarkar wrote:
> On Sun, Mar 29, 2009 at 12:35 PM, Naomi B. Robbins
>  wrote:
>   
>> In lattice, using the command trellis.par.get for superpose.symbol, plot,
>> symbol and/or dot.symbol shows that we can specify alpha, cex, col, fill
>> (for  superpose.symbol and plot.symbol), font, and pch.  Trial and error
>> shows that the font affects letters but not pch=1 or pch=3 (open circles
>> and plus signs.) I want to use open circles and plus signs, keep the colors
>> and cex  I've specified but make the symbols bolder, much the way a
>> higher lwd makes lines bolder.  Does anyone know of a library that
>> does that or can anyone think of a workaround to make the markers
>> stand out better without making them larger?
>> 
>
> ?grid::gpar lists 'lex' as a "Multiplier applied to line width", and
> that seems to work when supplied as a top-level argument (though not
> in the parameter settings):
>
> xyplot(1:10 ~ 1:10, pch = c(1, 3), cex = 2, lex = 3)
>
> I'm not sure if 'lwd' should have the same effect (it does in base graphics).
>
> -Deepayan
>
>
>   

-- 

Naomi B. Robbins

NBR

11 Christine Court

Wayne, NJ 07470

 

Phone: (973) 694-6009

na...@nbr-graphs.com

http://www.nbr-graphs.com

Author of /Creating More Effective Graphs 
/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing Points on Two Regression Lines

2009-03-30 Thread John Fox
Dear Abu,

I'm not sure why you're addressing this question to me.

It's unclear from your description whether there is one sample with four 
variables or two independent samples with the same two variables x and y. I'll 
assume the latter. The formula that you sent appears to assume equal error 
variances in two independent samples. A simple alternative that doesn't assume 
equal error variances would be to use something like

mod1 <- lm(y1 ~ x1)
mod2 <- lm(y2 ~ x2)

f1 <- predict(mod1, newdata=data.frame(x1=13), se.fit=TRUE)
f2 <- predict(mod2, newdata=data.frame(x2=13), se.fit=TRUE)

diff <- f1$fit - f2$fit
sediff <- sqrt(f1$se.fit^2 + f2$se.fit^2)
diff/sediff

The df for the test statistic aren't clear to me and in small samples this 
could make a difference. I suppose that one could use a Satterthwaite 
approximation, but a simple alternative would be to take the smaller of the 
residual df, here 5 - 2 = 3. In any event, the resulting test is likely 
sensitive to departures from normality, so it would probably be better to use a 
randomization test.
 
John

> -Original Message-
> From: AbouEl-Makarim Aboueissa [mailto:aabouei...@usm.maine.edu]
> Sent: March-30-09 4:57 PM
> To: j...@mcmaster.ca; jrkrid...@yahoo.ca; pbu...@pburns.seanet.com; r-
> h...@stat.math.ethz.ch; r-help-requ...@stat.math.ethz.ch;
> roland.rproj...@gmail.com; tuech...@gmx.at; www...@gmail.com
> Subject: Comparing Points on Two Regression Lines
> 
> Dear R users:
> 
> 
> 
> Suppose I have two different response variables y1, y2 that I regress
> separately on the different explanatory variables, x1 and x2 respectively. I
> need to compare points on two regression lines.
> 
> 
> 
> These are the x and y values for each lines.
> 
> 
> 
> x1<-c(0.5,1.0,2.5,5.0,10.0)
> y1<-c(204,407,1195,27404313)
> x2<-c(2.5,5.0,10.0,25.0)
> y2<-c(440,713,1520,2634)
> 
> 
> 
> Suppose we need to compare the two lines at the common value of x=13.
> 
> 
> 
> Please see attached the method as described in section 18.3 in Jerrold H.
> Zar.
> 
> 
> 
> With many thanks
> 
> 
> 
> Abou
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ==
> AbouEl-Makarim Aboueissa, Ph.D.
> Assistant Professor of Statistics
> Department of Mathematics & Statistics
> University of Southern Maine
> 96 Falmouth Street
> P.O. Box 9300
> Portland, ME 04104-9300
> 
> 
> Tel: (207) 228-8389
> Fax: (207) 780-5607
> Email: aabouei...@usm.maine.edu
>   aboue...@yahoo.com
> 
>   http://www.usm.maine.edu/~aaboueissa/
> 
> 
> Office: 301C Payson Smith
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-03-30 Thread Ted Harding
On 30-Mar-09 22:13:04, Jim Porzak wrote:
> Next week Wednesday evening, April 8th, Mike Driscoll will be talking
> about "Building Web Dashboards using R"
> see: http://www.meetup.com/R-Users/calendar/9718968/ for details & to
> RSVP.
> 
> Also of interest, our member Ron Fredericks has just posted a well
> edited video of the February kickoff panel discussion at Predictive
> Analytics World "The R and Science of Predictive Analytics: Four Case
> Studies in R" with
> * Bo Cowgill, Google
> * Itamar Rosenn, Facebook
> * David Smith, Revolution Computing
> * Jim Porzak, The Generations Network
> and chaired by Michael Driscoll, Dataspora LLC
> 
> see: http://www.lecturemaker.com/2009/02/r-kickoff-video/
> 
> Best,
> Jim Porzak

It could be very interesting to watch that video! However, I have
had a close look at the web page you cite:

  http://www.lecturemaker.com/2009/02/r-kickoff-video/

and cannot find a link to a video. Lots of links to non-video
things, but none that I could see to a video.

There is a link on that page at:
  How Google and Facebook are using R
  by Michael E. Driscoll | February 19, 2009
  

Following that link leads to a page, on which the first link, in:

  <(March 26th Update: Video now available)>
  Last night, I moderated our Bay Area R Users Group kick-off
  event with a panel discussion entitled "The R and Science of
  Predictive Analytics", co-located with the Predictive Analytics
  World conference here in SF.

leads you back to where you came from, and likewise the link at
the bottom of the page:

   is now available courtesy of Ron Fredericks
  and LectureMaker.

Could you help by describing where on that web page it can be found?
With thanks,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 30-Mar-09   Time: 23:55:07
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing Points on Two Regression Lines

2009-03-30 Thread AbouEl-Makarim Aboueissa
Dear R users:
 
Suppose I have two different response variables y1, y2 that I regress 
separately on the different explanatory variables, x1 and x2 respectively. I 
need to compare points on two regression lines.

These are the x and y values for each lines.

x1<-c(0.5,1.0,2.5,5.0,10.0)
y1<-c(204,407,1195,27404313)
x2<-c(2.5,5.0,10.0,25.0)
y2<-c(440,713,1520,2634)

Suppose we need to compare the two lines at the common value of x=13.

Please see attached the method as described in section 18.3 in Jerrold H. Zar.

With many thanks

Abou









==
AbouEl-Makarim Aboueissa, Ph.D.
Assistant Professor of Statistics
Department of Mathematics & Statistics
University of Southern Maine
96 Falmouth Street
P.O. Box 9300
Portland, ME 04104-9300


Tel: (207) 228-8389
Fax: (207) 780-5607
Email: aabouei...@usm.maine.edu
  aboue...@yahoo.com

  http://www.usm.maine.edu/~aaboueissa/


Office: 301C Payson Smith__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-03-30 Thread Sundar Dorai-Raj
Could be that you have some sort of ad filter in your browser that's
blocking the video? It appears just fine for me in Firefox 3.

On Mon, Mar 30, 2009 at 3:55 PM, Ted Harding
 wrote:
> On 30-Mar-09 22:13:04, Jim Porzak wrote:
>> Next week Wednesday evening, April 8th, Mike Driscoll will be talking
>> about "Building Web Dashboards using R"
>> see: http://www.meetup.com/R-Users/calendar/9718968/ for details & to
>> RSVP.
>>
>> Also of interest, our member Ron Fredericks has just posted a well
>> edited video of the February kickoff panel discussion at Predictive
>> Analytics World "The R and Science of Predictive Analytics: Four Case
>> Studies in R" with
>>     * Bo Cowgill, Google
>>     * Itamar Rosenn, Facebook
>>     * David Smith, Revolution Computing
>>     * Jim Porzak, The Generations Network
>> and chaired by Michael Driscoll, Dataspora LLC
>>
>> see: http://www.lecturemaker.com/2009/02/r-kickoff-video/
>>
>> Best,
>> Jim Porzak
>
> It could be very interesting to watch that video! However, I have
> had a close look at the web page you cite:
>
>  http://www.lecturemaker.com/2009/02/r-kickoff-video/
>
> and cannot find a link to a video. Lots of links to non-video
> things, but none that I could see to a video.
>
> There is a link on that page at:
>  How Google and Facebook are using R
>  by Michael E. Driscoll | February 19, 2009
>  
>
> Following that link leads to a page, on which the first link, in:
>
>  <(March 26th Update: Video now available)>
>  Last night, I moderated our Bay Area R Users Group kick-off
>  event with a panel discussion entitled "The R and Science of
>  Predictive Analytics", co-located with the Predictive Analytics
>  World conference here in SF.
>
> leads you back to where you came from, and likewise the link at
> the bottom of the page:
>
>   is now available courtesy of Ron Fredericks
>  and LectureMaker.
>
> Could you help by describing where on that web page it can be found?
> With thanks,
> Ted.
>
> 
> E-Mail: (Ted Harding) 
> Fax-to-email: +44 (0)870 094 0861
> Date: 30-Mar-09                                       Time: 23:55:07
> -- XFMail --
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2-geom_text()

2009-03-30 Thread Paul Murrell
Hi


Felipe Carrillo wrote:
> Hi: I need help with geom_text().
>  I would like to count the number of Locations
>  and put the sum of it right above each bar.
> 
> x <- "Location Lake_dens Fish Pred
> Lake1 1.132   1   0.115
> Lake1 0.627   1   0.148
> Lake1 1.324   1   0.104
> Lake1 1.265   1   0.107
> Lake2 1.074   0   0.096
> Lake2 0.851   0   0.108
> Lake2 1.098   0   0.095
> Lake2 0.418   0   0.135
> Lake2 1.256   1   0.088
> Lake2 0.554   1   0.126
> Lake2 1.247   1   0.088
> Lake2 0.794   1   0.112
> Lake2 0.181   0   0.152
> Lake3 1.694   0   0.001
> Lake3 1.018   0   0.001
> Lake3 2.880   0"
> DF <- read.table(textConnection(x), header = TRUE)
>  p <- ggplot(DF,aes(x=Location)) + geom_bar()
>  p + geom_text(aes(y=Location),label=sum(count)) # Error because count 
> doesn't exist in dataset
> 
>  What should I use instead of 'count' to be able to sum the number
>  of Locations?


How about ... ?

 p + geom_text(aes(label=..count..), stat="bin",
   vjust=1, colour="white")

Paul


> Felipe D. Carrillo  
> Supervisory Fishery Biologist  
> Department of the Interior  
> US Fish & Wildlife Service  
> California, USA
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-03-30 Thread Jim Porzak
Since Sundar beat me to it w/ Firefox 3 test, I checked with IE 7.0 -
works fine for me there also.

-Jim



On Mon, Mar 30, 2009 at 4:00 PM, Sundar Dorai-Raj  wrote:
> Could be that you have some sort of ad filter in your browser that's
> blocking the video? It appears just fine for me in Firefox 3.
>
> On Mon, Mar 30, 2009 at 3:55 PM, Ted Harding
>  wrote:
>> On 30-Mar-09 22:13:04, Jim Porzak wrote:
>>> Next week Wednesday evening, April 8th, Mike Driscoll will be talking
>>> about "Building Web Dashboards using R"
>>> see: http://www.meetup.com/R-Users/calendar/9718968/ for details & to
>>> RSVP.
>>>
>>> Also of interest, our member Ron Fredericks has just posted a well
>>> edited video of the February kickoff panel discussion at Predictive
>>> Analytics World "The R and Science of Predictive Analytics: Four Case
>>> Studies in R" with
>>>     * Bo Cowgill, Google
>>>     * Itamar Rosenn, Facebook
>>>     * David Smith, Revolution Computing
>>>     * Jim Porzak, The Generations Network
>>> and chaired by Michael Driscoll, Dataspora LLC
>>>
>>> see: http://www.lecturemaker.com/2009/02/r-kickoff-video/
>>>
>>> Best,
>>> Jim Porzak
>>
>> It could be very interesting to watch that video! However, I have
>> had a close look at the web page you cite:
>>
>>  http://www.lecturemaker.com/2009/02/r-kickoff-video/
>>
>> and cannot find a link to a video. Lots of links to non-video
>> things, but none that I could see to a video.
>>
>> There is a link on that page at:
>>  How Google and Facebook are using R
>>  by Michael E. Driscoll | February 19, 2009
>>  
>>
>> Following that link leads to a page, on which the first link, in:
>>
>>  <(March 26th Update: Video now available)>
>>  Last night, I moderated our Bay Area R Users Group kick-off
>>  event with a panel discussion entitled "The R and Science of
>>  Predictive Analytics", co-located with the Predictive Analytics
>>  World conference here in SF.
>>
>> leads you back to where you came from, and likewise the link at
>> the bottom of the page:
>>
>>   is now available courtesy of Ron Fredericks
>>  and LectureMaker.
>>
>> Could you help by describing where on that web page it can be found?
>> With thanks,
>> Ted.
>>
>> 
>> E-Mail: (Ted Harding) 
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 30-Mar-09                                       Time: 23:55:07
>> -- XFMail --
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mapping in R

2009-03-30 Thread Kelsey Scheitlin
Hi, I am looking for a specific mapping capability in R that I can't seem to 
find, but think exists. I would like to make a border of a map have alternating 
black and white squares instead of the common latitude and longitude grid.  
(example: http://www.cccturtle.org/sat_maps/map0bw8.gif). If anyone knows if 
there is or is not a function capable if doing this could you please let me 
know? Thanks!

Kelsey

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] two monitors

2009-03-30 Thread Veerappa Chetty
Hi, I have set up two monitors. I am using windows XP.  I would like to keep
one window- command line in one monitor and the script and graphs in the
second monitor. How do I set it up?
It works for word documents simply by dragging the document. It does not
work if I drag and drop the scripts window. Is R not compatible for this?
Thanks.
Chetty

-- 
Professor of Family Medicine
Boston University
Tel: 617-414-6221, Fax:617-414-3345
emails: chett...@gmail.com,vche...@bu.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] advice for alternative to barchart

2009-03-30 Thread kerfuffle

hi folks,

I was wondering if anybody could give me some advice.  I've created a
stacked barchart, with 'car model' along the x axis, 'number of cars' along
the y axis.  There are 45 individuals involved, each of which can own any
number of cars, of any model (eg an individual could own two cars of one
model, and another car of a different model).  I've got a legend by the side
of the barchart which gives the name of the individual, which gives the
colour to identify which bars belong to which individuals.

The problem (as you've probably guessed) is that it's almost impossible to
have a distinctive legend for 45 individuals.  I can manage 30 distinctive
colors, but as soon as I use shaded lines the number of distinct colours
drops considerably because the legend boxes are so small.  This is true even
if I vary line density and angle.  Therefore, after a long period of
experimentation, I'm thinking of giving up on barchart.

What I have in mind now is a plot where each 'bar' is a single line, and the
top of each 'bar' is a symbol (+, *, etc).  I figure it should be possible
to find 45 different symbols.  Does anyone have any advice?  I'm sorry this
is so open-ended, but I've played with stripchart and dotplot without a lot
of joy.  I figure this can't be that uncommon a need (barchart with a
ridiculous number of groups), but I could well be wrong.  Is there some way
of altering the size of the legend boxes in the barchart?  Using symbols in
the barchart?  Some way of using, say, 30 blocks of colour, and 15 cases of
a dashed line?  

Any thoughts would be greatly appreciated.

Paul
-- 
View this message in context: 
http://www.nabble.com/advice-for-alternative-to-barchart-tp22795050p22795050.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two monitors

2009-03-30 Thread Daniel Viar
Try Edit --> Gui Preferences --> SDI
and see if that works.

Dan Viar
Chesapeake, VA


On Mon, Mar 30, 2009 at 5:55 PM, Veerappa Chetty  wrote:
> Hi, I have set up two monitors. I am using windows XP.  I would like to keep
> one window- command line in one monitor and the script and graphs in the
> second monitor. How do I set it up?
> It works for word documents simply by dragging the document. It does not
> work if I drag and drop the scripts window. Is R not compatible for this?
> Thanks.
> Chetty
>
> --
> Professor of Family Medicine
> Boston University
> Tel: 617-414-6221, Fax:617-414-3345
> emails: chett...@gmail.com,vche...@bu.edu
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can I read a file into my workspace from Rprofile.site?

2009-03-30 Thread Elaine Jones


I am running R version 2.8.1 on  Windows XP OS.

When I launch R, I would like to automatically read a file containing my
database connections, user ids, and passwords into my workspace.

I tried including this in my Rprofile.site file:

...
local({
old <- getOption("defaultPackages")
options(defaultPackages = c(old, "Rcmdr","RODBC", "utils"))
})

.First <- function() {
library(utils)
setwd("C:/Documents and Settings/Administrator/My Documents/R")
connections <- read.csv("connections.csv", header=TRUE)
cat("\n   Welcome to R Elaine!\n\n")
}

...

When I launch R, it does not give me any error. The working directory
appears to be set by the Rprofile.site file, but the connections object is
not in my workspace:


   Welcome to R Elaine!

Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: car

Rcmdr Version 1.4-7

> ls()
character(0)


Any suggestions for how to resolve are appreciated!

 Elaine McGovern Jones 

 ISC Tape and DASD Storage Products
 Characterization and Failure Analysis Engineering
   Phone:  408  284 4853  Internal: 3-4853
   jon...@us.ibm.com




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can I read a file into my workspace from Rprofile.site?

2009-03-30 Thread Duncan Murdoch

Elaine Jones wrote:

I am running R version 2.8.1 on  Windows XP OS.

When I launch R, I would like to automatically read a file containing my
database connections, user ids, and passwords into my workspace.

I tried including this in my Rprofile.site file:

...
local({
old <- getOption("defaultPackages")
options(defaultPackages = c(old, "Rcmdr","RODBC", "utils"))
})

.First <- function() {
library(utils)
setwd("C:/Documents and Settings/Administrator/My Documents/R")
connections <- read.csv("connections.csv", header=TRUE)
cat("\n   Welcome to R Elaine!\n\n")
}

  
The connections variable will be local to .First, and will disappear 
after that function is done.  To save the variable into the global 
environment, use


connections <<- read.csv("connections.csv", header=TRUE)

instead.

Duncan Murdoch

...

When I launch R, it does not give me any error. The working directory
appears to be set by the Rprofile.site file, but the connections object is
not in my workspace:


   Welcome to R Elaine!

Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: car

Rcmdr Version 1.4-7

  

ls()


character(0)


Any suggestions for how to resolve are appreciated!

 Elaine McGovern Jones 

 ISC Tape and DASD Storage Products
 Characterization and Failure Analysis Engineering
   Phone:  408  284 4853  Internal: 3-4853
   jon...@us.ibm.com




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-03-30 Thread Ted Harding
On 30-Mar-09 23:04:40, Jim Porzak wrote:
> Since Sundar beat me to it w/ Firefox 3 test, I checked with IE 7.0 -
> works fine for me there also.
> 
> -Jim

Interesting! I'm using Iceweasel 2.0.0.19  (Firefox under another
name) on Linux. I'll have to check out what blocks it has activated!
I put on a fragrantly scented facemask and started IE up on Windows.
Going to the same URL, I now find a big "video screen" just below
the line:

The R and Science of Predictive Analytics: Four Case in R -- The Video

And it duly plays.

But at the same place in my Firefox, I only see a little button
inviting me to "Get Adobe Flash Player". But I already have that
installed for Iceweasel!. Well, maybe it needs updating. Let me
try that ... It says "Adobe Flash Player version 10.0.22.87" and
I have flashplayer_9 already there, so ... (some time later) I now
have flashplayer_10 installed, but I still get the same result.
Hmmm  
Well, thanks for helping to locte what the problem might be!
Ted.



> On Mon, Mar 30, 2009 at 4:00 PM, Sundar Dorai-Raj 
> wrote:
>> Could be that you have some sort of ad filter in your browser that's
>> blocking the video? It appears just fine for me in Firefox 3.
>>
>> On Mon, Mar 30, 2009 at 3:55 PM, Ted Harding
>>  wrote:
>>> On 30-Mar-09 22:13:04, Jim Porzak wrote:
 Next week Wednesday evening, April 8th, Mike Driscoll will be
 talking
 about "Building Web Dashboards using R"
 see: http://www.meetup.com/R-Users/calendar/9718968/ for details &
 to
 RSVP.

 Also of interest, our member Ron Fredericks has just posted a well
 edited video of the February kickoff panel discussion at Predictive
 Analytics World "The R and Science of Predictive Analytics: Four
 Case
 Studies in R" with
 _ _ * Bo Cowgill, Google
 _ _ * Itamar Rosenn, Facebook
 _ _ * David Smith, Revolution Computing
 _ _ * Jim Porzak, The Generations Network
 and chaired by Michael Driscoll, Dataspora LLC

 see: http://www.lecturemaker.com/2009/02/r-kickoff-video/

 Best,
 Jim Porzak
>>>
>>> It could be very interesting to watch that video! However, I have
>>> had a close look at the web page you cite:
>>>
>>> _http://www.lecturemaker.com/2009/02/r-kickoff-video/
>>>
>>> and cannot find a link to a video. Lots of links to non-video
>>> things, but none that I could see to a video.
>>>
>>> There is a link on that page at:
>>> _How Google and Facebook are using R
>>> _by Michael E. Driscoll | February 19, 2009
>>> _
>>>
>>> Following that link leads to a page, on which the first link, in:
>>>
>>> _<(March 26th Update: Video now available)>
>>> _Last night, I moderated our Bay Area R Users Group kick-off
>>> _event with a panel discussion entitled "The R and Science of
>>> _Predictive Analytics", co-located with the Predictive Analytics
>>> _World conference here in SF.
>>>
>>> leads you back to where you came from, and likewise the link at
>>> the bottom of the page:
>>>
>>> _ is now available courtesy of Ron Fredericks
>>> _and LectureMaker.
>>>
>>> Could you help by describing where on that web page it can be found?
>>> With thanks,
>>> Ted.
>>>
>>> 
>>> E-Mail: (Ted Harding) 
>>> Fax-to-email: +44 (0)870 094 0861
>>> Date: 30-Mar-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 23:55:07
>>> -- XFMail --
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-09   Time: 00:49:08
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-03-30 Thread Rolf Turner


I get to the video screen OK --- there's a large greenish sideways
triangle waiting to be clicked on.  I do so; there's a message that
says it's downloading, with a little progress bar.  That seems to
complete quite rapidly.  Then nothing for a while.  Then an error
message on the video screen saying ``Fatal error --- video source
not ready.''  Then that error message goes away.  Long wait.  Then
I get audio, but never any video.  Give up.

I'm using Firefox on an Imac; the ``About Mozilla Firefox'' button
on the Firefox dropdown menu says I've got Mozilla 5.0, Firefox 2.0.0.2
--- whatever that means.

Bottom line --- I can't watch the video.

But that's the story of my life.  ***Nothing*** ever works for me! :-)
Except R, which *mostly* works.

cheers,

Rolf Turner

On 31/03/2009, at 12:49 PM, Ted Harding wrote:


On 30-Mar-09 23:04:40, Jim Porzak wrote:

Since Sundar beat me to it w/ Firefox 3 test, I checked with IE 7.0 -
works fine for me there also.

-Jim


Interesting! I'm using Iceweasel 2.0.0.19  (Firefox under another
name) on Linux. I'll have to check out what blocks it has activated!
I put on a fragrantly scented facemask and started IE up on Windows.
Going to the same URL, I now find a big "video screen" just below
the line:

The R and Science of Predictive Analytics: Four Case in R -- The Video

And it duly plays.

But at the same place in my Firefox, I only see a little button
inviting me to "Get Adobe Flash Player". But I already have that
installed for Iceweasel!. Well, maybe it needs updating. Let me
try that ... It says "Adobe Flash Player version 10.0.22.87" and
I have flashplayer_9 already there, so ... (some time later) I now
have flashplayer_10 installed, but I still get the same result.
Hmmm 
Well, thanks for helping to locte what the problem might be!
Ted.



On Mon, Mar 30, 2009 at 4:00 PM, Sundar Dorai-Raj  


wrote:

Could be that you have some sort of ad filter in your browser that's
blocking the video? It appears just fine for me in Firefox 3.

On Mon, Mar 30, 2009 at 3:55 PM, Ted Harding
 wrote:

On 30-Mar-09 22:13:04, Jim Porzak wrote:

Next week Wednesday evening, April 8th, Mike Driscoll will be
talking
about "Building Web Dashboards using R"
see: http://www.meetup.com/R-Users/calendar/9718968/ for details &
to
RSVP.

Also of interest, our member Ron Fredericks has just posted a well
edited video of the February kickoff panel discussion at  
Predictive

Analytics World "The R and Science of Predictive Analytics: Four
Case
Studies in R" with
_ _ * Bo Cowgill, Google
_ _ * Itamar Rosenn, Facebook
_ _ * David Smith, Revolution Computing
_ _ * Jim Porzak, The Generations Network
and chaired by Michael Driscoll, Dataspora LLC

see: http://www.lecturemaker.com/2009/02/r-kickoff-video/

Best,
Jim Porzak


It could be very interesting to watch that video! However, I have
had a close look at the web page you cite:

_http://www.lecturemaker.com/2009/02/r-kickoff-video/

and cannot find a link to a video. Lots of links to non-video
things, but none that I could see to a video.

There is a link on that page at:
_How Google and Facebook are using R
_by Michael E. Driscoll | February 19, 2009
_

Following that link leads to a page, on which the first link, in:

_<(March 26th Update: Video now available)>
_Last night, I moderated our Bay Area R Users Group kick-off
_event with a panel discussion entitled "The R and Science of
_Predictive Analytics", co-located with the Predictive Analytics
_World conference here in SF.

leads you back to where you came from, and likewise the link at
the bottom of the page:

_ is now available courtesy of Ron Fredericks
_and LectureMaker.

Could you help by describing where on that web page it can be  
found?

With thanks,
Ted.

--- 
-

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 30-Mar-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time:  
23:55:07
-- XFMail  
--


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-09   Time: 00:49:08
-- XFMail --

__
R-help@r-project.org mailing lis

  1   2   >