Re: [R] What is the best way to lag a time series?

2010-12-26 Thread Liviu Andronic
On Sun, Dec 26, 2010 at 8:49 AM, Christian Schoder
schoc...@newschool.edu wrote:
 Dear R-users,

 I've been using R for a while and I am very satisfied! Unfortunately, I
 still have not figured out an efficient and general way to construct and
 use lags of time series, especially when I need to work with different
 packages.

 Let me give an example. I have two time series x and y and I want to
 estimate a variaty of distributed lags models and run different tests
 (autocorrelation, etc). It is obvious that I need to be able to lag x
 and y in a flexible way. So far, my temporary solution was to construct
 the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it
 to R, which is not very satisfactory because it does not allow for much
 flexibility.

 Is there a straighforward command which allows me to easily construct a
 lag

Perhaps ?diff.

Liviu


 when required and which allows me to, for example, use the lm()
 command to fit a dynamic model and the bgtest() command to perform the
 breusch-godfrey test on the same model?

 Is it adviseable to use time series objects which consist of many time
 series (like a dataframe) or is it better to have it contain only one
 time series?

 I would be grateful for any hints and links.

 Thx!
 Christian

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] object names from character strings

2010-12-26 Thread Jim Bouldin

I realize this is probably pretty basic but I can't figure it out.

I'm looping through an array, doing various calculations and producing a 
resulting data frame in each loop iteration.  I need to give each data 
frame a different name.  Although I can easily create a new character 
string for writing each frame to an output file, I cannot figure out how 
to convert such strings to corresponding object names within the R 
workspace itself, so as to give each d.f. a distinct name.  The closest 
I got were various attempts with the as.name function, but couldn't get 
that to work either.  Any help appreciated.  Thanks.


--
Jim Bouldin, PhD
Research Ecologist

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the best way to lag a time series?

2010-12-26 Thread Patrick Burns

First off, there are data manipulation
techniques that will beat doing it in
a spreadsheet.  For example:

head(x, -1)

is lagged 1 relative to

tail(x, -1)

But I think you are really looking for
'Lag' in the 'quantmod' package.

On 26/12/2010 07:49, Christian Schoder wrote:

Dear R-users,

I've been using R for a while and I am very satisfied! Unfortunately, I
still have not figured out an efficient and general way to construct and
use lags of time series, especially when I need to work with different
packages.

Let me give an example. I have two time series x and y and I want to
estimate a variaty of distributed lags models and run different tests
(autocorrelation, etc). It is obvious that I need to be able to lag x
and y in a flexible way. So far, my temporary solution was to construct
the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it
to R, which is not very satisfactory because it does not allow for much
flexibility.

Is there a straighforward command which allows me to easily construct a
lag when required and which allows me to, for example, use the lm()
command to fit a dynamic model and the bgtest() command to perform the
breusch-godfrey test on the same model?

Is it adviseable to use time series objects which consist of many time
series (like a dataframe) or is it better to have it contain only one
time series?

I would be grateful for any hints and links.

Thx!
Christian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to specify ff object filepaths when reading a CSV file into a ff data frame.

2010-12-26 Thread Xiaobo Gu
Hi, I have done another simple test, I test the two syntext against a
CSV file with only one column, both success,

 fdf - read.csv.ffdf(file=D:/rtemp/fftest2.csv,asffdf_args = list( col_args 
 =  list(filename=c(F:/a.f
 fdf
ffdf (all open) dim=c(2,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
 PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
PhysicalIsOpen
col1 col1  integer   integer FALSE   FALSE
   FALSE 11   1
   TRUE
ffdf data
  col1
11
22


 fdf - read.csv.ffdf(file=D:/rtemp/fftest2.csv,asffdf_args = list( col_args 
 =  c(list(filename=D:/a2.f
 fdf
ffdf (all open) dim=c(2,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
 PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
PhysicalIsOpen
col1 col1  integer   integer FALSE   FALSE
   FALSE 11   1
   TRUE
ffdf data
  col1
11
22


Regards,

Xiaobo Gu



On Fri, Dec 24, 2010 at 11:27 PM, Xiaobo Gu guxiaobo1...@gmail.com wrote:
 Hi,
    The read.csv.ffdf function in package ff will create the ff object
 physical file in the default directories, I am trying to let the files
 created in the paths users specify, I think the point is to make use
 of the asffdf_args parameter,
 I have a test CSV file named D:\rtemp\fftest.csv, the content of the
 file is as following:

 col1,col2,col3
 1,amber,2.4
 2,linda,4.5

 I tried the following code, hoping ff will create the physical files
 for col1,col2 and col3 to D:/a.f,D:/b.f,D:/c.f respectively

  fdf - read.csv.ffdf(file=D:/rtemp/fftest.csv,asffdf_args = list(
 col_args =  c(list(filename=D:/a.f), list(filename=D:/b.f),
 list(filename=D:/c.f
 and the error message is :
 Error in as.ff.default(1:2, vmode = NULL, filename = D:/a.f,
 filename = D:/b.f,  :
  formal argument filename matched by multiple actual arguments

 I also tried the following:

 fdf - read.csv.ffdf(file=D:/rtemp/fftest.csv,asffdf_args = list( col_args 
 =  list(filename=c(D:/a.f,D:/b.f,D:/c.f
 Error in ff(initdata = initdata, length = length, levels = levels,
 ordered = ordered,  :
  bad argument initdata for existing file; initializing existing file is 
 invalid
 In addition: Warning messages:
 1: In if (file.exists(filename)) { :
  the condition has length  1 and only the first element will be used
 2: In if (file.exists(filename)) { :
  the condition has length  1 and only the first element will be used
 3: In if (file.access(filename, 4) == -1) { :
  the condition has length  1 and only the first element will be used
 4: In if (file.access(filename, 2) == -1) { :
  the condition has length  1 and only the first element will be used
 5: In if (is.na(filesize)) stop(unable to open file) :
  the condition has length  1 and only the first element will be used

 My questions are:
 1. What's the datatype of the col_args parameter of the as.ffdf function
 2. If I can make layout of the asffdf_args parameter correct, how can
 I set the exact filenames for each column of the ff data frame.

 Regards,

 Xiaobo Gu


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R2WinBugs data import error

2010-12-26 Thread unsown

For some purpose, I  need to transfer a NAs array to WinBugs through
R2WinBugs, But I constantly got an error message:'type' must be real for
this format. Here is my data to transfer:

x = matrix(data=NA,nrow=3,ncol=3)
x =  as.array(x)
data - list (x)

if I add a line to above setting, then I can pass R2WinBugs:

x[1,1] = 0

If I manually input the NA array to WinBugs, I could get it running. So my
original data set has no problem with WinBugs.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R2WinBugs-data-import-error-tp3164106p3164106.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fitting mixtures with non-linear parameters constraints

2010-12-26 Thread Jonathan Rosenblatt
Dear R users

Does anyone happen to know a function to fit a Gaussian mixture using
*non-linear* constraints between the parameters? (An EM the allows
that will do the job obviously).

Thank you in advance

--
Jonathan Rosenblatt
www.john-ros.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Marius Hofert
Dear expeRts,

how can I decrease the space between the tick marks and the corresponding 
labels in an splom?
See here:

library(lattice)
U - matrix(runif(4000), ncol = 8)
splom(U, axis.text.cex = 0.2) # = space between the [small] tick labels and 
tick marks is/seems to be too large

I checked ?panel.pairs but could not find an option for that.

Cheers,

Marius
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread Dror D Lev
Dear r helpers,

I would like to look at the interaction between two two-level factors, one
between and one within participants, after accounting for any variance due
to practice (31 trials in each of two blocks) in the task.
It seems to require treating practice as a covariate.

All the examples I noticed for handling covariates (i.e. ANCOVA, including
the ones in Faraway's Practical regression and anova using r) use lm(),
but this doesn't handle repeated-measures.

I thought of a solution in the form of first running a regression on the
covariate:
 cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)

and then run the aov() on the residuals:
 m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +
Error(subj/withinVar, data=dat)

Does it seem to be a valid answer to my problem?

Is there an existing function that can do this (perhaps more appropriately)?

Thank you for any help,
dror

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

Dear all,

My double for loop as follows, but it is little efficient, I hope all
friends can give me a vectorized program to replace my code. thanks

x: is a matrix  202*263,  that is 202 samples, and 263 independent variables

num.compd-nrow(x); # number of compounds
diss.all-0
for( i in 1:num.compd)
   for (j in 1:num.compd)
  if (i!=j) {
S1-sum(x[i,]*x[j,])
S2-sum(x[i,]^2)
S3-sum(x[j,]^2)
sim2-S1/(S2+S3-S1)
diss2-1-sim2
diss.all-diss.all+diss2}

it will cost a long time to finish this computation! i really need rapid
code to replace my code.

thanks

kevin


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R2WinBugs data import error

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 12:44 AM, unsown wrote:



For some purpose, I  need to transfer a NAs array to WinBugs through
R2WinBugs, But I constantly got an error message:'type' must be  
real for

this format. Here is my data to transfer:

x = matrix(data=NA,nrow=3,ncol=3)


str(x)
It is of mode logical.

Try instead:
x = matrix(vector(mode=numeric,0) ,nrow=3,ncol=3)



x =  as.array(x)
data - list (x)


Why are you making a list with a single character element? If you need  
to pass the matricx you just created in a list then try (and don't use  
data as the name :


dat - list(x)





if I add a line to above setting, then I can pass R2WinBugs:

x[1,1] = 0

If I manually input the NA array to WinBugs, I could get it running.  
So my

original data set has no problem with WinBugs.
--



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] object names from character strings

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 4:04 AM, Jim Bouldin wrote:


I realize this is probably pretty basic but I can't figure it out.

I'm looping through an array, doing various calculations and  
producing a resulting data frame in each loop iteration.  I need to  
give each data frame a different name.  Although I can easily create  
a new character string for writing each frame to an output file, I  
cannot figure out how to convert such strings to corresponding  
object names within the R workspace itself, so as to give each d.f.  
a distinct name.  The closest I got were various attempts with the  
as.name function, but couldn't get that to work either.  Any help  
appreciated.  Thanks.


Here's the first example in the help(assign) page:

or(i in 1:6) { #-- Create objects 'r.1', 'r.2', ... 'r.6'
   nam - paste(r,i, sep=.)
  assign(nam, 1:i) }
ls(pattern = ^r..$)




--
Jim Bouldin, PhD
Research Ecologist

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] object names from character strings

2010-12-26 Thread jim holtman
Consider storing the dataframes in a list so that you do not have to
create unique names and it will also give you better control by
keeping all the data together in one object.

On Sun, Dec 26, 2010 at 4:04 AM, Jim Bouldin bouldi...@gmail.com wrote:
 I realize this is probably pretty basic but I can't figure it out.

 I'm looping through an array, doing various calculations and producing a
 resulting data frame in each loop iteration.  I need to give each data frame
 a different name.  Although I can easily create a new character string for
 writing each frame to an output file, I cannot figure out how to convert
 such strings to corresponding object names within the R workspace itself, so
 as to give each d.f. a distinct name.  The closest I got were various
 attempts with the as.name function, but couldn't get that to work either.
  Any help appreciated.  Thanks.

 --
 Jim Bouldin, PhD
 Research Ecologist

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:


Dear expeRts,

how can I decrease the space between the tick marks and the  
corresponding labels in an splom?

See here:

library(lattice)
U - matrix(runif(4000), ncol = 8)
splom(U, axis.text.cex = 0.2) # = space between the [small] tick  
labels and tick marks is/seems to be too large


So you want more tick marks?



I checked ?panel.pairs but could not find an option for that.


What about the pscales argument?

A single number would increase the number of ticks, or a list with  
at and labels values can be passed. Seem to be just what you asked  
for.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] T2 hoteling

2010-12-26 Thread leyla khodakarim
Dear All

It is very kind of you to guide me.

When I want to run this line, I see this error

stat.obs - apply(GS, 2, function(z) Hott2(t(DATA[which(z==1),]), cl))

Error in colSums(w * x) : 'x' must be an array of at least two dimensions

cl - as.factor(y)

GS: a matrix with 0 or 1

GS: gene sets

- a data matrix with rows=genes,

columns= gene sets,

GS[i,j]=1 if gene i in gene set j

GS[i,j]=0 otherwise

Hott2 - function(x, y, var.equal=TRUE) #T2 hoteling

Y- c(1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,1,0,1,0,1)

Data=transpose(X)= gene expression: row=40 gene, column=10 sample

Data: there is in attachment file

Thanks a lot


-
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote:


Dear r helpers,

I would like to look at the interaction between two two-level  
factors, one
between and one within participants, after accounting for any  
variance due

to practice (31 trials in each of two blocks) in the task.
It seems to require treating practice as a covariate.

All the examples I noticed for handling covariates (i.e. ANCOVA,  
including
the ones in Faraway's Practical regression and anova using r) use  
lm(),

but this doesn't handle repeated-measures.


See if Dalgaard's piece in R-News offers better guidance:

http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf




I thought of a solution in the form of first running a regression on  
the

covariate:

cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)


and then run the aov() on the residuals:

m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +

Error(subj/withinVar, data=dat)

Does it seem to be a valid answer to my problem?

Is there an existing function that can do this (perhaps more  
appropriately)?


Thank you for any help,
dror

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Berend Hasselman


bbslover wrote:
 
 x: is a matrix  202*263,  that is 202 samples, and 263 independent
 variables
 
 num.compd-nrow(x); # number of compounds
 diss.all-0
 for( i in 1:num.compd)
for (j in 1:num.compd)
   if (i!=j) {
 S1-sum(x[i,]*x[j,])
 S2-sum(x[i,]^2)
 S3-sum(x[j,]^2)
 sim2-S1/(S2+S3-S1)
 diss2-1-sim2
 diss.all-diss.all+diss2}
 
 it will cost a long time to finish this computation! i really need rapid
 code to replace my code.
 

Alternative 1:  j-loop only needs to start at i+1 so

for( i in 1:num.compd) {
for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
S1-sum(x[i,]*x[j,])
S2-sum(x[i,]^2)
S3-sum(x[j,]^2)
sim2-S1/(S2+S3-S1)
diss2-1-sim2
diss2.all-diss2.all+diss2
}
}
diss2.all - 2 * diss2.all

On my pc this is about twice as fast as your version (with 202 samples and
263 variables)

Alternative 2: all sum() are not necessary. Use some matrix algebra:

xtx - x %*% t(x)
diss3.all - 0
for( i in 1:num.compd) {
for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
S1 - xtx[i,j]
S2 - xtx[i,i]
S3 - xtx[j,j]
sim2-S1/(S2+S3-S1)
diss2-1-sim2
diss3.all-diss3.all+diss2
}
}
diss3.all - 2 * diss3.all

This is about four times as fast as alternative 1.

I'm quite sure that more expert R gurus can get some more speed up.

Note: I generated the x matrix with:
set.seed(1);x-matrix(runif(202*263),nrow=202)
(Timings on iMac 2.16Ghz and using 64-bit R)

Berend

-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164262.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the best way to lag a time series?

2010-12-26 Thread Bert Gunter
The correct answer to How to lag..? is almost certainly, Don't.
The functionality of  numerous time series packages and functions take
care of this automatically for you (using suitable data structures,
probably). Rather than trying to reinvent wheels, it might be wiser to
consult the Time Series Task View on Cran to see what's there first.

Incidentally, my limited understanding is that modern time series
methods tend to use more appropriately specified covariance structures
(e.g. arima models) rather than the lagged models  of e.g. classical
econometrics. But on this, I would happily stand correction.

-- Cheers,
 Bert

On Sun, Dec 26, 2010 at 12:21 AM, Liviu Andronic landronim...@gmail.com wrote:
 On Sun, Dec 26, 2010 at 8:49 AM, Christian Schoder
 schoc...@newschool.edu wrote:
 Dear R-users,

 I've been using R for a while and I am very satisfied! Unfortunately, I
 still have not figured out an efficient and general way to construct and
 use lags of time series, especially when I need to work with different
 packages.

 Let me give an example. I have two time series x and y and I want to
 estimate a variaty of distributed lags models and run different tests
 (autocorrelation, etc). It is obvious that I need to be able to lag x
 and y in a flexible way. So far, my temporary solution was to construct
 the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it
 to R, which is not very satisfactory because it does not allow for much
 flexibility.

 Is there a straighforward command which allows me to easily construct a
 lag

 Perhaps ?diff.

 Liviu


 when required and which allows me to, for example, use the lm()
 command to fit a dynamic model and the bgtest() command to perform the
 breusch-godfrey test on the same model?

 Is it adviseable to use time series objects which consist of many time
 series (like a dataframe) or is it better to have it contain only one
 time series?

 I would be grateful for any hints and links.

 Thx!
 Christian

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Do you know how to read?
 http://www.alienetworks.com/srtest.cfm
 http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
 Do you know how to write?
 http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculation of BIC done by leaps-package

2010-12-26 Thread Jan Henckens

Hi Folks,

I've got a question concerning the calculation of the Schwarz-Criterion 
(BIC) done by summary.regsubsets() of the leaps-package:


Using regsubsets() to perform subset-selection I receive an regsubsets 
object that can be summarized by summary.regsubsets(). After this 
operation the resulting summary contains a vector of BIC-values 
representing models of size i=1,...,K.


My problem is that I can't reproduce the calculation of these BIC 
values. I already tried to use extractAIC(...,k=log(n)), 
AIC(...,k=log(n)) and manual calculation using the RSS-vector but none 
matches the calculation done by the summary-function. I already checked 
for constants that could be the reason for the differences but i found 
out, that the values vary apart of adding a constant term.



The source code of the leaps-package states the package calculates the 
BIC this way:


bicvec-c(bicvec,(n1+ll$intercept)*log(vr)+i*log(n1+ll$intercept))

with:

## number of observations - Intercept:
n1-ll$nn-ll$intercept
## fraction of sum of squared residulas model i
## and sum of squared residuals null model, I
## just can't understand why the vector ll$ress
## is subscripted double
vr-ll$ress[i,j]/ll$nullrss
## maximum number of variables
i

^^ This seems to match the calculation done by extractAIC but it doesn't!

Maybe anyone can tell me about the reason of the variation of the 
BIC-values?


Best regards,
Jan Henckens



### Minimal Example:
require(leaps)
bridge - 
read.table(http://www.stat.tamu.edu/~sheather/book/docs/datasets/bridge.txt;, 
header=TRUE)

fmla.full - formula(Time ~ .)
(lm.model - summary(regsubsets(fmla.full,data=bridge,weights=NULL, 
intercept=TRUE, method=forward)))

lm.model$bic
### The first two models constructed via lm():
extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge)))
extractAIC(lm(Time~Dwgs+Case,data=bridge),k=log(nrow(bridge)))

or see

http://www.henckens.de/min_example.R



--
jan.henckens | jöllenbecker str. 58 | 33613 bielefeld | germany
tel 0521-5251970

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Performing basic Multiple Sequence Alignment in R?

2010-12-26 Thread Mike Marchywka



 From: marchy...@hotmail.com
 To: tal.gal...@gmail.com; r-help@r-project.org
 Subject: RE: [R] Performing basic Multiple Sequence Alignment in R?
 Date: Tue, 21 Dec 2010 17:03:17 -0500

  From: tal.gal...@gmail.com
  Date: Tue, 21 Dec 2010 20:17:18 +0200
  Subject: Re: [R] Performing basic Multiple Sequence Alignment in R?
  To: r-help@r-project.org
 
 
  Dear Mike and Thomas,
 
  From what I gathered here (Thanks to Joris Meys):
  http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434
  There is an R interface to the MUSCLE algorithm in the bio3d package
  (function seqaln()).
  But not one for clustal.
 
  I will probably end up using pairwiseAlignment on pairs of allignments
  with some sort of stopping rules (I'll have to play with it to see how
  it works).


 http://scholar.google.com/scholar?hl=enq=%22exact+string+matching%22+alignment

 http://citeseerx.ist.psu.edu/search?q=exact+string+matching+alignment+dnasubmit=Searchsort=rel

 Certainly if you are flexible and can use whatever may be close in R that
 is fine but I seem to recall that exact string matching was a fast and
 interesting way to go and maybe some of the authors above, in the interest
 of promoting their work, would help implement an R version if there is demand.

 I seem to recall I did something like building indexes of the strings to be 
 aligned
 first, finding substrings that were unique to a given string but appeared only
 once in each of the sequences to be aligned ( this was the most restrictive 
 criterion
 but you can imagine how to make it more accomodating). Now that you got me 
 started,
 up front tokenizing or compiling of input sequences ( usually no more than 
 indexing
 them in some way ) made many later operations like alignment go faster. This
 may have ended up being similar to BLAST but now I can't really recall. 
 Anyway,
 my point here is that some where in R there may be packages that
 generate intermediate forms useful across disciplines- mining data from
 text, linquistics, or macromolecule analysis.  In fact, the indexing process
 helps find things that have migrated a long ways from their original place
 and there are probably other non-alignment related things you could
 get out of the approach.



If you pursue this or make some decision would you please get back to
us, at least me off list? I just went back through my old code and hit the 
search links I posted above, this still seems like quite an interesting
area and the issues do not appear to be confined to bio. Looking at
my method names in my code, it looks like I had a way to supply fixed patterns,
probably from places like PROSITE or CDD, for use as the string you
probably meant to suggest although I seem to think it would make more sense
to discover these based on the strings it finds in the sequences.

I seem to recall I could do 2 sequences reasonably well with some quirks and 
limitations
but gave up when I tried to do multiple alignments ( actually there was no point
at the time). Recent literature seems to still talk about sub-quadratic time 
although practically for large sequences the real execution time could be 
dominated
by VM not algorithm order LOL. The indexing also makes it possible to find 
related
but distant strings, something that may be of interest but not normally
thought of as alignment between strings perturbed in limited ways ( edit 
distance
being rather restricted to a few operations). 

If you find a specific paper or approach that seems to work that may be
of interest to many here and indeed may be implemented under some other name. 

Thanks.














 

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread Dror D Lev
Thank you David, for the reference to Dalgaard's paper in Rnews_2007-2.

Unfortunately I don't seem to have the mathematical-statistical
sophistication required to adapt the example in Dalgaard's paper for my
case.

I hope someone can suggest a less-mathematical direction for solution.

Thanks again,
dror



On Sun, Dec 26, 2010 at 3:59 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote:

  Dear r helpers,

 I would like to look at the interaction between two two-level factors, one
 between and one within participants, after accounting for any variance due
 to practice (31 trials in each of two blocks) in the task.
 It seems to require treating practice as a covariate.

 All the examples I noticed for handling covariates (i.e. ANCOVA, including
 the ones in Faraway's Practical regression and anova using r) use lm(),
 but this doesn't handle repeated-measures.


 See if Dalgaard's piece in R-News offers better guidance:

 http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf




 I thought of a solution in the form of first running a regression on the
 covariate:

 cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)


 and then run the aov() on the residuals:

 m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +

 Error(subj/withinVar, data=dat)

 Does it seem to be a valid answer to my problem?

 Is there an existing function that can do this (perhaps more
 appropriately)?

 Thank you for any help,
 dror

 --

 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 9:55 AM, Dror D Lev wrote:

Thank you David, for the reference to Dalgaard's paper in  
Rnews_2007-2.


Unfortunately I don't seem to have the mathematical-statistical  
sophistication required to adapt the example in Dalgaard's paper for  
my case.


I hope someone can suggest a less-mathematical direction for solution.


Here's what I would suggest if you want to stay more concrete. If you  
are not prepared to offer a minimal subset of your own data and also  
provide working or non-working code that uses it, then pick an  
available dataset that resembles it in structure and autocorrelation.  
One possibility would be the BodyWeight dataset in either the nlme or  
the MEMSS packages (although see below for my current level of  
uncertainty regarding your data).


require(nlme)
plot(BodyWeight)



Thanks again,
dror



On Sun, Dec 26, 2010 at 3:59 PM, David Winsemius dwinsem...@comcast.net 
 wrote:


On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote:

Dear r helpers,

I would like to look at the interaction between two two-level  
factors, one
between and one within participants, after accounting for any  
variance due

to practice (31 trials in each of two blocks) in the task.
It seems to require treating practice as a covariate.


I had trouble figuring out exactly what you meant by 31 trials in two  
blocks. Was that 31 trials by each participant? Or was it two trials  
by each of 31 participants divided unequally into two groups?


--
David.



All the examples I noticed for handling covariates (i.e. ANCOVA,  
including
the ones in Faraway's Practical regression and anova using r) use  
lm(),

but this doesn't handle repeated-measures.

See if Dalgaard's piece in R-News offers better guidance:

http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf




I thought of a solution in the form of first running a regression on  
the

covariate:
cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)

and then run the aov() on the residuals:
m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +
Error(subj/withinVar, data=dat)

Does it seem to be a valid answer to my problem?

Is there an existing function that can do this (perhaps more  
appropriately)?


Thank you for any help,
dror
--

David Winsemius, MD
West Hartford, CT




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Marius Hofert
Dear David,

thank you for your answer.
As I wrote, I am looking for an option to control the *space* between the tick 
marks and the corresponding labels. I am happy with the *number* of tick marks 
and their default values. As far as I know, pscales can't control the space, so 
it is *not* what I am looking for.

Cheers,

Marius

On 2010-12-26, at 14:36 , David Winsemius wrote:

 
 On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:
 
 Dear expeRts,
 
 how can I decrease the space between the tick marks and the corresponding 
 labels in an splom?
 See here:
 
 library(lattice)
 U - matrix(runif(4000), ncol = 8)
 splom(U, axis.text.cex = 0.2) # = space between the [small] tick labels and 
 tick marks is/seems to be too large
 
 So you want more tick marks?
 
 
 I checked ?panel.pairs but could not find an option for that.
 
 What about the pscales argument?
 
 A single number would increase the number of ticks, or a list with at and 
 labels values can be passed. Seem to be just what you asked for.
 
 --
 
 David Winsemius, MD
 West Hartford, CT
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] can't install R with *local* gcc

2010-12-26 Thread Oliver Kullmann
Hello,

we re-distribute R with our open-source platform http://www.ok-sat-library.org/
where we use R mainly for evaluation of computational experiments.
Due to the various platforms, we build everything from source, and that works 
fine.
Until now, that is: there are circumstances (for example in computer-science 
computer labs)
where no Fortran-compiler is provided, and the users (students) can't change 
that.
Thus we now try to build gfortran as part of the GCC version 4.2.4 suite, and 
building
R using that local gcc.
We already use the local C and C++ compiler of the suite extensively, and that
all works. But we don't have any experience with using gfortran.
The gcc-build works fine, everything seems alright --- only R (version 2.11.0) 
won't build with it:

We use the configuration

F77=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
 
FC=${F77} 
CC=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gcc
 
CXX=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/g++
 
LDFLAGS=-L 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib
 
./configure 
--prefix=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/R/2.11.0

(the same problems with lib64 instead of lib, by the way)

which yields

checking for Fortran 77 libraries of 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran...
  
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib
 
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4
 
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../lib64
 -L/lib/../lib64 -L/usr/lib/../lib64 
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../..
 -lgfortranbegin -lgfortran -lm 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/libgfortran.a

which looks alright to me (but I don't know Fortran), but then we get

checking for dummy main to link with Fortran 77 libraries... none
checking for Fortran 77 name-mangling scheme... lower case, underscore, no 
extra underscore
checking whether 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
 appends underscores to external names... yes
checking whether 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
 appends extra underscores to external names... no
checking whether mixed C/Fortran code can be run... configure: WARNING: cannot 
run mixed C/Fortran code
configure: error: Maybe check LDFLAGS for paths to Fortran libraries?
make: *** [R_base] Error 1

The R installation-documentation doesn't say much on using local compilers 
(more or less nothing), and everything we could
get from it are the above settings of environment variables.

Internet search reveals old stuff on libg2c which appears not to exist 
anymore, some recommendations
not to build from sources (which is not an option for us), an open Sage ticket 
(apparently without any
further work on it), and a request to the R-list with apparently no reply.

Since we are working in a well-defined setting (gcc is fully under our 
control), and apparently
all the libraries needed are build by gcc (though this is nowhere said or 
(dream) specified),
it should be possible to solve that problem.

I very hope to get some hints (we can't get R running (for our system!) 
otherwise).
The error is exactly the same on various systems (all 64-bit machines, Intel 
and AMD).
If we use the system-gcc (4.5.0 or 4.1.2) then the installation of R works 
without problems;
here (for one of the machines) some data

 version
platform   x86_64-unknown-linux-gnu 
arch   x86_64   
os linux-gnu
system x86_64, linux-gnu
status  
major  2
minor  11.0 
year   2010 
month  04   
day22   
svn rev51801
language   R
version.string R version 2.11.0 (2010-04-22)

Thanks for you help in any case!

Oliver

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package arules - 'transpose' of the transactions

2010-12-26 Thread Michael Hahsler

Hi Kohleth,


Suppose this is my list of transactions:


set.seed(200)

tran=random.transactions(100,3)

inspect(tran)

  itemstransactionID
1 {item80}trans1
2 {item8,
   item20}trans2
3 {item28}trans3


I want to get the 'transpose' of the data, i.e.

  transactionID  items
1 {trans2}item8
2 {trans2}item20
3 {trans3}item28
4 {trans1}item80



This is not the transpose. The data structure you want can be created 
this way:


 l - LIST(tran)
 single - data.frame(ID=rep(names(l), lapply(l, length)), 
items=unlist(l), row.names=NULL)

 single
  ID  items
1 trans1 item80
2 trans2  item8
3 trans2 item20
4 trans3 item28



I tried converting tran into a matrix, then transpose it, then convert it
back to transactions. But my dataset is actually very very large, so I
wonder if there is any faster method?


The method above should be very fast.

-Michael



Thanks





--
  Dr. Michael Hahsler, Visiting Assistant Professor
  Department of Computer Science and Engineering
  Lyle School of Engineering
  Southern Methodist University, Dallas, Texas

  (214) 768-8878 * mhahs...@lyle.smu.edu * http://lyle.smu.edu/~mhahsler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lost in POSIX

2010-12-26 Thread Jeff Newmiller

Dimitri Shvorob wrote:
 df = structure(list(t = structure(c(1033963406.044, 1033974144.847, 
+ 1033988418.836), class = c(POSIXt, POSIXct))), .Names = t, row.names
= c(NA, 
+ 3L), class = data.frame) 
df$min = trunc(df$t,units=mins) 


does not work, Jeff; you will see that my original post suggests familiarity
with 'trunc' :) 


Well, perhaps you should read the error message or the Value section of 
?trunc.POSIXt, and convert the result to a compact type...


 df$min - trunc( df$t, units=mins )
Error in `$-.data.frame`(`*tmp*`, min, value = list(sec = 0, min = c(3L,  :
  replacement has 9 rows, data has 3
 df$min - as.POSIXct( trunc( df$t, units=mins ) )
 str(df)
'data.frame':   3 obs. of  2 variables:
 $ t  : POSIXct, format: 2002-10-06 21:03:26 2002-10-07 00:02:24 ...
 $ min: POSIXct, format: 2002-10-06 21:03:00 2002-10-07 00:02:00 ...


--
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lost in POSIX

2010-12-26 Thread Jeff Newmiller

Dimitri Shvorob wrote:

.. One issue with the solution proposed by Jeff is that the transformed
column does not have the original's type:


x = structure(list(time = structure(c(1020232904.818, 1020232904.818
), class = c(POSIXt, POSIXct), tzone = ), price = c(321, 
323.5), minute = c(1020232860, 1020232860)), .Names = c(time, 
price, minute), row.names = 1:2, class = data.frame)


minute - function(t)
{ 
  d - as.POSIXlt(t, origin = as.Date(1970-01-01)) 
  d$sec - 0 
  as.POSIXct(d) 
} 

x$minute = sapply(x$time, minute)  



head(x)

 time price minute
1 2002-05-01 07:01:44 321.0 1020232860
2 2002-05-01 07:01:44 323.5 1020232860


class(x.l$minute)

[1] numeric



That is not an issue with the minute function, as you can see if you
evaluate

 minute(x$time)
[1] 2002-04-30 23:01:00 PDT 2002-04-30 23:01:00 PDT

or

 str(minute(x$time))
 POSIXct[1:2], format: 2002-04-30 23:01:00 2002-04-30 23:01:00

rather, you are seeing a side effect of sapply.

--
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lost in POSIX

2010-12-26 Thread David Winsemius


On Dec 25, 2010, at 2:25 PM, Dimitri Shvorob wrote:



df = structure(list(t = structure(c(1033963406.044, 1033974144.847,
+ 1033988418.836), class = c(POSIXt, POSIXct))), .Names = t,  
row.names

= c(NA,
+ 3L), class = data.frame)
df$min = trunc(df$t,units=mins)

does not work,


??? seems to work on my system. Perhaps you should say what you mean  
by not work


 df
t min
1 2002-10-07 00:03:26 2002-10-07 00:03:00
2 2002-10-07 03:02:24 2002-10-07 03:02:00
3 2002-10-07 07:00:18 2002-10-07 07:00:00
 sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid  splines   stats graphics  grDevices utils  
datasets  methods   base


other attached packages:
 [1] nlme_3.1-97lme4_0.999375-37   Matrix_0.999375-46  
zoo_1.6-4  ggplot2_0.8.8  proto_0.3-8 
reshape_0.8.3  plyr_1.2.1 MASS_7.3-9
[10] rms_3.1-0  Hmisc_3.8-3survival_2.36-2 
sos_1.3-0  brew_1.0-4 lattice_0.19-13


loaded via a namespace (and not attached):
[1] cluster_1.13.2 stats4_2.12.1  tools_2.12.1



Jeff; you will see that my original post suggests familiarity
with 'trunc' :)


--
View this message in context: 
http://r.789695.n4.nabble.com/Lost-in-POSIX-tp3052768p3163914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A question on Statistics

2010-12-26 Thread Maithula Chandrashekhar
I am not a pure Statistics background and therefore please forgive me if
this question (which is not R related either) is too trivial.

In many Statistics literature I find following statement: restrictions in
different coefficients matrices have to be imposed to ensure uniqueness of
the parametrization. Can somebody tell me what is the meaning of Uniqueness
in the parametrization? Does it mean that, two different coefficient
matrices may give exactly the same result, and therefore coefficient matrix
is not unique?

I find there are many members (perhaps all) in this forum who are really
masters in Statistics. Therefore I hope somebody will clarify me with the
intuition behind that.

Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread RICHARD M. HEIBERGER
Dror,

Please look at the
demo(MMC.apple)
in the HH package

install.packages(HH) ## if you don't already have it.
library(HH)
demo(MMC.apple)

Please reply to the list if there are further queries.

Rich

On Sun, Dec 26, 2010 at 7:42 AM, Dror D Lev dror.te...@gmail.com wrote:

 Dear r helpers,

 I would like to look at the interaction between two two-level factors, one
 between and one within participants, after accounting for any variance due
 to practice (31 trials in each of two blocks) in the task.
 It seems to require treating practice as a covariate.

 All the examples I noticed for handling covariates (i.e. ANCOVA, including
 the ones in Faraway's Practical regression and anova using r) use lm(),
 but this doesn't handle repeated-measures.

 I thought of a solution in the form of first running a regression on the
 covariate:
  cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)

 and then run the aov() on the residuals:
  m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +
 Error(subj/withinVar, data=dat)

 Does it seem to be a valid answer to my problem?

 Is there an existing function that can do this (perhaps more
 appropriately)?

 Thank you for any help,
 dror

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can't install R with *local* gcc

2010-12-26 Thread peter dalgaard

On Dec 26, 2010, at 17:50 , Oliver Kullmann wrote:

 Hello,
 
 we re-distribute R with our open-source platform 
 http://www.ok-sat-library.org/
 where we use R mainly for evaluation of computational experiments.
 Due to the various platforms, we build everything from source, and that works 
 fine.
 Until now, that is: there are circumstances (for example in computer-science 
 computer labs)
 where no Fortran-compiler is provided, and the users (students) can't change 
 that.
 Thus we now try to build gfortran as part of the GCC version 4.2.4 suite, and 
 building
 R using that local gcc.
 We already use the local C and C++ compiler of the suite extensively, and that
 all works. But we don't have any experience with using gfortran.
 The gcc-build works fine, everything seems alright --- only R (version 
 2.11.0) won't build with it:
 
 We use the configuration
 
 F77=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
  
 FC=${F77} 
 CC=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gcc
  
 CXX=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/g++
  
 LDFLAGS=-L 
 /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib
  
 ./configure 
 --prefix=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/R/2.11.0
 
 (the same problems with lib64 instead of lib, by the way)
 
 which yields
 
 checking for Fortran 77 libraries of 
 /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran...
   
 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib
  
 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4
  
 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../lib64
  -L/lib/../lib64 -L/usr/lib/../lib64 
 -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../..
  -lgfortranbegin -lgfortran -lm 
 /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/libgfortran.a
 
 which looks alright to me (but I don't know Fortran), but then we get
 
 checking for dummy main to link with Fortran 77 libraries... none
 checking for Fortran 77 name-mangling scheme... lower case, underscore, no 
 extra underscore
 checking whether 
 /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
  appends underscores to external names... yes
 checking whether 
 /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
  appends extra underscores to external names... no
 checking whether mixed C/Fortran code can be run... configure: WARNING: 
 cannot run mixed C/Fortran code
 configure: error: Maybe check LDFLAGS for paths to Fortran libraries?
 make: *** [R_base] Error 1
 
 The R installation-documentation doesn't say much on using local compilers 
 (more or less nothing), and everything we could
 get from it are the above settings of environment variables.
 
 Internet search reveals old stuff on libg2c which appears not to exist 
 anymore, some recommendations
 not to build from sources (which is not an option for us), an open Sage 
 ticket (apparently without any
 further work on it), and a request to the R-list with apparently no reply.
 
 Since we are working in a well-defined setting (gcc is fully under our 
 control), and apparently
 all the libraries needed are build by gcc (though this is nowhere said or 
 (dream) specified),
 it should be possible to solve that problem.
 
 I very hope to get some hints (we can't get R running (for our system!) 
 otherwise).
 The error is exactly the same on various systems (all 64-bit machines, Intel 
 and AMD).
 If we use the system-gcc (4.5.0 or 4.1.2) then the installation of R works 
 without problems;
 here (for one of the machines) some data

I suppose r-devel would be a better mailing list for this sort of thing, but 
since we're here:

Hint #1: Expect the process to be somewhat painful...
Hint #2: Study the configure script and config.log to the level where you can 
reproduce the  mixed C/Fortran code that it is trying to build and run and with 
which commands it is trying to build it
Hint #3: Figure out what it really should have done to build such code

An alternative hint is first to try setting up a very simple Fortran function 
to, say, double a number, and a C main program that calls it. Then try figuring 
out the compiler/linker options to make it work. (That is of course what 
configure was trying to do in the first place, but doing it by hand might be 
less prone to getting multiple toolchains mixed up.)


 
 version
 platform   x86_64-unknown-linux-gnu 
 arch   x86_64   
 os

Re: [R] A question on Statistics

2010-12-26 Thread Bert Gunter
Maithula:

On Sun, Dec 26, 2010 at 11:09 AM, Maithula Chandrashekhar
m.chandrashekhar1...@gmail.com wrote:
 I am not a pure Statistics background and therefore please forgive me if
 this question (which is not R related either) is too trivial.

 In many Statistics literature I find following statement: restrictions in
 different coefficients matrices have to be imposed to ensure uniqueness of
 the parametrization. Can somebody tell me what is the meaning of Uniqueness
 in the parametrization? Does it mean that, two different coefficient
 matrices may give exactly the same result, and therefore coefficient matrix
 is not unique?
-- yes.

See the section on contrast matrices in Venables and Ripley's
Modern Applied Statistics with S (MASS) for a concise but, I think,
illuminating explanation. (It's in the chapter on linear
models/regression).

-- Bert


 I find there are many members (perhaps all) in this forum who are really
 masters in Statistics. Therefore I hope somebody will clarify me with the
 intuition behind that.

 Thanks,

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about mars() -function

2010-12-26 Thread Tiina Hakanen

Hi!

I have some questions about MARS model's coefficient of determination.  
I use the MARS method in my master's thesis and I have noticed some  
problems with

the MARS model's R^2.

You can see the following example that the MARS model's R^2 is too big  
when i have used mars() -function for MARS model building, and when I  
have made MARS-model using a linear regression, it gives much smaller  
R^2.


So can you please tell me some information about why the MARS model  
R^2 is so big? How can I get the MARS model´s correct R^2 in  
R-projector some another way than in the following example or by  
calculating it myself using R^2-formula?


I hope you can reply soon.

Best regards,

Tiina Hakanen


library(ElemStatLearn)
library(mda)
data-ozone
m-mars(data[,-1], data[,1], nk=4)
m$factor[m$s,]
m$cuts[m$s,]
m$coef
marsmodel-lm(data[,1]~m$x-1)
summary(marsmodel)

Call:
lm(formula = data[, 1] ~ m$x - 1)

Residuals:
Min  1Q  Median  3Q Max
-36.264 -15.993  -2.351   9.993 122.793

Coefficients:
 Estimate Std. Error t value Pr(|t|)
m$x1  52.9783 3.8894  13.621   2e-16 ***
m$x2   4.7383 0.9599   4.936 2.92e-06 ***
m$x3  -1.9428 0.3084  -6.300 6.61e-09 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 23.38 on 108 degrees of freedom
Multiple R-squared: 0.8147, Adjusted R-squared: 0.8095
F-statistic: 158.2 on 3 and 108 DF,  p-value:  2.2e-16

knot1 - function (x,k) ifelse(x  k, x-k, 0)
knot2 - function(x, k) ifelse(x  k, k-x, 0)
reg - lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data)

summary(reg)

Call:
lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature,
85), data = data)

Residuals:
Min  1Q  Median  3Q Max
-36.264 -15.993  -2.351   9.993 122.793

Coefficients:
   Estimate Std. Error t value Pr(|t|)
(Intercept) 52.9783 3.8894  13.621   2e-16 ***
knot1(temperature, 85)   4.7383 0.9599   4.936 2.92e-06 ***
knot2(temperature, 85)  -1.9428 0.3084  -6.300 6.61e-09 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 23.38 on 108 degrees of freedom
Multiple R-squared: 0.5153, Adjusted R-squared: 0.5064
F-statistic: 57.42 on 2 and 108 DF,  p-value:  2.2e-16

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.data? without separator

2010-12-26 Thread Fror

Hello,

I have a problem with read.data. For example I have a file

# comment
1?0001010101
101010??1010

with comment on first line and data layout without separator. How I could
read data that each character\sign was in another column. It is trivial
probably, but I have no idea for it.

Thank's,
Kacper
-- 
View this message in context: 
http://r.789695.n4.nabble.com/read-data-without-separator-tp3164358p3164358.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] T2 hoteling

2010-12-26 Thread Jim Lemon

On 12/27/2010 12:43 AM, leyla khodakarim wrote:

Dear All

It is very kind of you to guide me.

When I want to run this line, I see this error

stat.obs- apply(GS, 2, function(z) Hott2(t(DATA[which(z==1),]), cl))

Error in colSums(w * x) : 'x' must be an array of at least two dimensions

cl- as.factor(y)

GS: a matrix with 0 or 1

GS: gene sets

-  a data matrix with rows=genes,

columns= gene sets,

GS[i,j]=1 if gene i in gene set j

GS[i,j]=0 otherwise

Hott2- function(x, y, var.equal=TRUE) #T2 hoteling

Y- c(1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,1,0,1,0,1)

Data=transpose(X)= gene expression: row=40 gene, column=10 sample

Data: there is in attachment file


Hi Leyla,
Your attachment didn't make it to the list, but the problem may be that 
which(z==1) reduces the matrix (array? data frame?) X to a vector. One 
other thing that looks funny is the capitalization. In R, X and x are 
different, as are DATA and Data. First thing is to just print out the 
data you are trying to analyze:


DATA[which(z==1)]

and see if it really is an array with at least two dimensions.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in  
various languages to this question, but I have a pretty simple need  
(and I'm not much good at regex).  I want to use a chemical formula as  
a function argument.  The formula would be in Hill order which is to  
list C, then H, then all other elements in alphabetical order.  My  
example will have only a limited number of elements, few enough that  
one can search directly for each element.  So some examples would be  
C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or  
Br, there is no following number meaning a 1 is implied).


Let's say

 form - C5H11BrO

I'd like to get the count of each element, so in this case I need to  
extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular weight by mulitplying).  Sounds pretty simple, but my  
experiments with grep and strsplit don't immediately clue me into an  
obvious solution.  As I said, I don't need a general solution to the  
problem of calculating molecular weight from an arbitrary formula,  
that seems quite challenging, just a way to convert form into a list  
or data frame which I can then do the math on.


Here's hoping this is a simple issue for more experienced R users!   
TIA,  Bryan

***
Bryan Hanson
Professor of Chemistry  Biochemistry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levelplot blocks size

2010-12-26 Thread jonathan

Thanks for your advice, but my data is not decimals, so I don't need to round
the values. Instead, what I need to really do is group the values into
larger blocks.

My data looks sort of like this:

xy z 
00687 
0164 
0271 
0355 
0452 
0551 
0638 
0738 
0854 
0949 
. 
. 
. 
987   9881
999   9981
999   9991


But what I need to do is make it so that on the graph rather than having
tiny little dots for each point (as shown in the bigplot diagram), there are
bigger points, so say 0=x10, 0=y10 is one point in the lower left,
rather than having 100 points for each x,y value.

The same strategy should then be applied to the whole graph.

Any ideas how to achieve this? I'm sure this is quite a common thing to do
want to with heatmaps??

Thanks,

Jonathan

-- 
View this message in context: 
http://r.789695.n4.nabble.com/levelplot-blocks-size-tp3089972p3164564.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLS with corAR(1) correlation structure residual/standard error calculation

2010-12-26 Thread Katharina Ley
I am using the gls function to fit a two-stage least squares model with
first order autoregressive error terms. Since there is no automated
adjustment for the use of two-stage least squares in this package, I am
trying to manually replicate standard errors of the coefficient estimates in
order to adjust for a first stage OLS estimate of endogenous variables.
However, thus far I have been unable to replicate the residuals or standard
errors produced by this function. My understanding is outlined below, but
using this approach does not yield the reported results. Is anyone familiar
with the inner workings of this function and can either explain the
calculation of the standard errors or provide code that explains the inner
workings of this function.

Thanks!

Example of the model I am running:
model1- gls(Y~ X1I + X2 + X3 + X4, data=Dat1, correlation = corAR1(),
method = ML)

My understanding of model errors:
Y = b_0 + X1 b_1+ ...Xk b_k + Z
Z_t =phi Z_{t-1) + e_t

The residuals reported by GLS are the Z's, while the white noise terms are
the e's. I cannot replicate the reported residuals using this approach. I
also do not know how Z_0 should be calculated, i.e. what does the first step
of this recursive procedure look like?

From the residuals, I also cannot replicate the reported standard errors. I
am using se(b_j) = sqrt(sigma^2/sum(x_i-x_mean)^2) where sigma =sqrt(SSR/df)

Any help on this or explanation of how GLS works would be much appreciated.

Any clarification would be much appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread jim holtman
try this:

 f.extract - function(formula)
+ {
+ # pattern to match the initial chemical
+ # assumes chemical starts with an upper case and optional lower
case followed
+ # by zero or more digits.
+ first - ^([[:upper:]][[:lower:]]?)([0-9]*).*
+ # inverse of above to remove the initial chemical
+ last - ^[[:upper:]][[:lower:]]?[0-9]*(.*)
+ result - list()
+ extract - formula
+ # repeat as long as there is data
+ while ((start - nchar(extract))  0){
+ chem - sub(first, '\\1 \\2', extract)
+ extract - sub(last, '\\1', extract)
+ # if the number of characters is the same, then there was an error
+ if (nchar(extract) == start){
+ warning(Invalid formula:, formula)
+ return(NULL)
+ }
+ # append to the list
+ result[[length(result) + 1L]] - strsplit(chem, ' ')[[1]]
+ }
+ result
+ }
 f.extract(C5H11BrO)
[[1]]
[1] C 5

[[2]]
[1] H  11

[[3]]
[1] Br

[[4]]
[1] O

 f.extract(H2O)
[[1]]
[1] H 2

[[2]]
[1] O

 f.extract(CCC)
[[1]]
[1] C

[[2]]
[1] C

[[3]]
[1] C

 f.extract(Crr)  # bad
NULL
Warning message:
In f.extract(Crr) : Invalid formula:Crr


On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote:
 Hello R Folks...

 I've been looking around the 'net and I see many complex solutions in
 various languages to this question, but I have a pretty simple need (and I'm
 not much good at regex).  I want to use a chemical formula as a function
 argument.  The formula would be in Hill order which is to list C, then H,
 then all other elements in alphabetical order.  My example will have only a
 limited number of elements, few enough that one can search directly for each
 element.  So some examples would be C5H12, or C5H12O or C5H11BrO (note that
 for oxygen and bromine, O or Br, there is no following number meaning a 1 is
 implied).

 Let's say

 form - C5H11BrO

 I'd like to get the count of each element, so in this case I need to extract
 C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
 weight by mulitplying).  Sounds pretty simple, but my experiments with grep
 and strsplit don't immediately clue me into an obvious solution.  As I said,
 I don't need a general solution to the problem of calculating molecular
 weight from an arbitrary formula, that seems quite challenging, just a way
 to convert form into a list or data frame which I can then do the math on.

 Here's hoping this is a simple issue for more experienced R users!  TIA,
  Bryan
 ***
 Bryan Hanson
 Professor of Chemistry  Biochemistry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread David A. Johnston

There might be something simpler, but this is what I came up with:

form = C5H11BrO
ups = c(gregexpr([[:upper:]], form)[[1]], nchar(form) + 1)
seperated = sapply(1:(length(ups)-1), function(x) substr(form, ups[x],
ups[x+1] - 1))
elements =  gsub([[:digit:]], , seperated)
nums = gsub([[:alpha:]], , seperated)
ans = data.frame(element = as.character(elements),
  num = as.numeric(ifelse(nums == , 1, nums)), stringsAsFactors = FALSE)
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Parsing-a-Simple-Chemical-Formula-tp3164562p3164581.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Gabor Grothendieck
On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote:
 Hello R Folks...

 I've been looking around the 'net and I see many complex solutions in
 various languages to this question, but I have a pretty simple need (and I'm
 not much good at regex).  I want to use a chemical formula as a function
 argument.  The formula would be in Hill order which is to list C, then H,
 then all other elements in alphabetical order.  My example will have only a
 limited number of elements, few enough that one can search directly for each
 element.  So some examples would be C5H12, or C5H12O or C5H11BrO (note that
 for oxygen and bromine, O or Br, there is no following number meaning a 1 is
 implied).

 Let's say

 form - C5H11BrO

 I'd like to get the count of each element, so in this case I need to extract
 C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
 weight by mulitplying).  Sounds pretty simple, but my experiments with grep
 and strsplit don't immediately clue me into an obvious solution.  As I said,
 I don't need a general solution to the problem of calculating molecular
 weight from an arbitrary formula, that seems quite challenging, just a way
 to convert form into a list or data frame which I can then do the math on.

 Here's hoping this is a simple issue for more experienced R users!  TIA,

This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF - strapply(form,
   ([A-Z][a-z]*)(\\d*),
   ~ c(..1, if (nchar(..2)) ..2 else 1),
   simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE))
DF[[2]] - as.numeric(DF[[2]])

DF looks like this:

 DF
  V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.data? without separator

2010-12-26 Thread jim holtman
I have a problem with 'read.data' also in that I don't see that as a
function in the 'base'; I assume you meant read.table.

Also you did not indicate is all the lines were the same length.  Here
is a solution to return a list is each character broken out
separately.

 x - readLines(textConnection(# comment
+ 1?0001010101
+ 101010??1010))
 closeAllConnections()
 # split lines 2-n into a list of separate characters
 result - lapply(x[-1], function(.line) strsplit(.line, '')[[1]])
 result
[[1]]
 [1] 1 ? 0 0 0 1 0 1 0 1 0 1

[[2]]
 [1] 1 0 1 0 1 0 ? ? 1 0 1 0



On Sun, Dec 26, 2010 at 1:04 PM, Fror f...@interia.pl wrote:

 Hello,

 I have a problem with read.data. For example I have a file

 # comment
 1?0001010101
 101010??1010

 with comment on first line and data layout without separator. How I could
 read data that each character\sign was in another column. It is trivial
 probably, but I have no idea for it.

 Thank's,
 Kacper
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/read-data-without-separator-tp3164358p3164358.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 6:29 PM, Bryan Hanson wrote:


Hello R Folks...

I've been looking around the 'net and I see many complex solutions  
in various languages to this question, but I have a pretty simple  
need (and I'm not much good at regex).  I want to use a chemical  
formula as a function argument.  The formula would be in Hill  
order which is to list C, then H, then all other elements in  
alphabetical order.  My example will have only a limited number of  
elements, few enough that one can search directly for each element.   
So some examples would be C5H12, or C5H12O or C5H11BrO (note that  
for oxygen and bromine, O or Br, there is no following number  
meaning a 1 is implied).


Let's say

 form - C5H11BrO


Well here's how I see it:

The form can be split with a regular expression:
Capital letter followed by zero or one lower, followeed by a various  
number of digits


greg - gregexpr([A-Z]{1}[a-z]?[0-9]*, form)

Append a number equal to one moe lan the ength for reasins that will  
become clear


ugreg - c(unlist(greg), nchar(form)+1)

Then use substring function to serially pick from a split point to one  
minus the next split point (or in that case of the last element one  
minus the length of the string:


 sapply(1:(length(ugreg)-1), function(z) substr(form, ugreg[z],  
ugreg[z+1]-1) )

[1] C5  H11 Br  O

Then you can split these triples (cap,lower,n) and if n is absent  
assume 1.


 sub((\\d*)$, , sapply(1:(length(ugreg)-1),   # blank out the  
digits

function(z) substr(form, ugreg[z], ugreg[z+1]-1) ) )
[1] C  H  Br O

sub(^$, 1, sub(([A-Za-z]*), ,# subst 1 for empty strings
sapply(1:(length(ugreg)-1),
  function(z) substr(form, ugreg[z], ugreg[z 
+1]-1) ) ) )

[1] 5  11 1  1

If you limited the number of elements searched for, it might improve  
the error trapping, I suppose.


--
David.




I'd like to get the count of each element, so in this case I need to  
extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate  
the molecular weight by mulitplying).  Sounds pretty simple, but my  
experiments with grep and strsplit don't immediately clue me into an  
obvious solution.  As I said, I don't need a general solution to the  
problem of calculating molecular weight from an arbitrary formula,  
that seems quite challenging, just a way to convert form into a  
list or data frame which I can then do the math on.


Here's hoping this is a simple issue for more experienced R users!   
TIA,  Bryan

***
Bryan Hanson
Professor of Chemistry  Biochemistry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson
Well let me just say thanks and WOW!  Four great ideas, each worthy of  
study and I'll learn several things from each.  Interestingly, these  
solutions seem more general and more compact than the solutions I  
found on the 'net using python and perl.  More evidence for the power  
of R!  A big thanks to each of you!  Bryan


On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu  
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in
various languages to this question, but I have a pretty simple need  
(and I'm
not much good at regex).  I want to use a chemical formula as a  
function
argument.  The formula would be in Hill order which is to list C,  
then H,
then all other elements in alphabetical order.  My example will  
have only a
limited number of elements, few enough that one can search directly  
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO  
(note that
for oxygen and bromine, O or Br, there is no following number  
meaning a 1 is

implied).

Let's say


form - C5H11BrO


I'd like to get the count of each element, so in this case I need  
to extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular
weight by mulitplying).  Sounds pretty simple, but my experiments  
with grep
and strsplit don't immediately clue me into an obvious solution.   
As I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just a way
to convert form into a list or data frame which I can then do the  
math on.


Here's hoping this is a simple issue for more experienced R users!   
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF - strapply(form,
  ([A-Z][a-z]*)(\\d*),
  ~ c(..1, if (nchar(..2)) ..2 else 1),
  simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =  
FALSE))

DF[[2]] - as.numeric(DF[[2]])

DF looks like this:


DF

 V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Spencer Graves

  Have you considered the 'CHNOSZ' package?


 makeup(C5H11BrO )
   count
C  5
H 11
Br 1
O  1


  I found this using the 'sos' package as follows:


library(sos)
cf - ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


  The print method for cf opened the results in a web browser, 
which showed that the CHNOSZ package had 14 of these 11 matches, and 
the other 7 were in 7 different packages.  Moreover, the CHNOSZ 
package is devoted to Chemical Thermodynamics and Activity Diagrams 
and provides many more capabilities that might interest you.



  Hope this helps.
  Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each worthy of 
study and I'll learn several things from each.  Interestingly, these 
solutions seem more general and more compact than the solutions I 
found on the 'net using python and perl.  More evidence for the power 
of R!  A big thanks to each of you!  Bryan


On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:


On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in
various languages to this question, but I have a pretty simple need 
(and I'm
not much good at regex).  I want to use a chemical formula as a 
function
argument.  The formula would be in Hill order which is to list C, 
then H,
then all other elements in alphabetical order.  My example will have 
only a
limited number of elements, few enough that one can search directly 
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO 
(note that
for oxygen and bromine, O or Br, there is no following number 
meaning a 1 is

implied).

Let's say


form - C5H11BrO


I'd like to get the count of each element, so in this case I need to 
extract

C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
weight by mulitplying).  Sounds pretty simple, but my experiments 
with grep
and strsplit don't immediately clue me into an obvious solution.  As 
I said,

I don't need a general solution to the problem of calculating molecular
weight from an arbitrary formula, that seems quite challenging, just 
a way
to convert form into a list or data frame which I can then do the 
math on.


Here's hoping this is a simple issue for more experienced R users!  
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF - strapply(form,
  ([A-Z][a-z]*)(\\d*),
  ~ c(..1, if (nchar(..2)) ..2 else 1),
  simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = 
FALSE))

DF[[2]] - as.numeric(DF[[2]])

DF looks like this:


DF

 V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Spencer Graves
p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.  I 
have not looked at these vignettes, but most vignettes provide excellent 
introductions (though rarely with complete coverage) of important 
capabilities of the package.  (The 'sos' package includes a vignette, 
which exposes more capabilities than the example below.)



##
  Have you considered the 'CHNOSZ' package?



makeup(C5H11BrO )

   count
C  5
H 11
Br 1
O  1


  I found this using the 'sos' package as follows:


library(sos)
cf - ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


  The print method for cf opened the results in a web browser, 
which showed that the CHNOSZ package had 14 of these 11 matches, and 
the other 7 were in 7 different packages.  Moreover, the CHNOSZ 
package is devoted to Chemical Thermodynamics and Activity Diagrams 
and provides many more capabilities that might interest you.



  Hope this helps.
  Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:

Well let me just say thanks and WOW!  Four great ideas, each worthy of
study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the power
of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:


On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in
various languages to this question, but I have a pretty simple need
(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in Hill order which is to list C,
then H,
then all other elements in alphabetical order.  My example will have
only a
limited number of elements, few enough that one can search directly
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form - C5H11BrO


I'd like to get the count of each element, so in this case I need to
extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious solution.  As
I said,
I don't need a general solution to the problem of calculating molecular
weight from an arbitrary formula, that seems quite challenging, just
a way
to convert form into a list or data frame which I can then do the
math on.

Here's hoping this is a simple issue for more experienced R users!
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF - strapply(form,
  ([A-Z][a-z]*)(\\d*),
  ~ c(..1, if (nchar(..2)) ..2 else 1),
  simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
FALSE))
DF[[2]] - as.numeric(DF[[2]])

DF looks like this:


DF

 V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Drop column from a data frame

2010-12-26 Thread John Sorkin
I am trying to drop a column of a data frame. The code below attempts to drop a 
numeric column (which does not work but gives no error or warning) and a factor 
column (which does not work but gives an error).
I would appreciate someone telling me why my code does not work, and suggesting 
code that will work.
Thanks,
John

rm(dfxyz,dfxz,dfxy)

# create the data frame.
dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
dfxyz

names(dfxyz)

# try to drop y column
# does not work, does not produce error message
dfxz - dfxyz[,-(dfxyz$y)]
dfxz

# try to drop z column
# does not work, produces error message:
# In Ops.factor(df$z) : - not meaningful for factors
dfxy - dfxyz[,-dfxyz$z]
dfxy



John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Peter Ehlers

On 2010-12-26 08:26, Marius Hofert wrote:

Dear David,

thank you for your answer.
As I wrote, I am looking for an option to control the *space* between the tick 
marks and the corresponding labels. I am happy with the *number* of tick marks 
and their default values. As far as I know, pscales can't control the space, so 
it is *not* what I am looking for.


Marius,
I think that you mean something like the following:

 U - matrix(runif(300), ncol = 3)
 splom(U, par.settings = list(
axis.components = list(
left = list(pad1 = 3)
)
  )
 )

which will adjust the left axis; you'll have to add
right, top, bottom components to handle those as well.

Have a look at what trellis.par.get() produces and
check the axis.components section.

Peter Ehlers



Cheers,

Marius

On 2010-12-26, at 14:36 , David Winsemius wrote:



On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:


Dear expeRts,

how can I decrease the space between the tick marks and the corresponding 
labels in an splom?
See here:

library(lattice)
U- matrix(runif(4000), ncol = 8)
splom(U, axis.text.cex = 0.2) # =  space between the [small] tick labels and 
tick marks is/seems to be too large


So you want more tick marks?



I checked ?panel.pairs but could not find an option for that.


What about the pscales argument?

A single number would increase the number of ticks, or a list with at and 
labels values can be passed. Seem to be just what you asked for.

--

David Winsemius, MD
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson
Thanks Spencer, I'll definitely have a look at this package and it's  
vignettes.  I believe I have looked at it before, but didn't catch it  
on this particular search.  Bryan


On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.  I  
have not looked at these vignettes, but most vignettes provide  
excellent introductions (though rarely with complete coverage) of  
important capabilities of the package.  (The 'sos' package includes  
a vignette, which exposes more capabilities than the example below.)



##
 Have you considered the 'CHNOSZ' package?



makeup(C5H11BrO )

  count
C  5
H 11
Br 1
O  1


 I found this using the 'sos' package as follows:


library(sos)
cf - ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


 The print method for cf opened the results in a web browser,  
which showed that the CHNOSZ package had 14 of these 11 matches,  
and the other 7 were in 7 different packages.  Moreover, the  
CHNOSZ package is devoted to Chemical Thermodynamics and Activity  
Diagrams and provides many more capabilities that might interest you.



 Hope this helps.
 Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each worthy  
of

study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the power
of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu  
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex  
solutions in

various languages to this question, but I have a pretty simple need
(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in Hill order which is to list C,
then H,
then all other elements in alphabetical order.  My example will  
have

only a
limited number of elements, few enough that one can search directly
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form - C5H11BrO


I'd like to get the count of each element, so in this case I need  
to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious solution.   
As

I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just

a way
to convert form into a list or data frame which I can then do the
math on.

Here's hoping this is a simple issue for more experienced R users!
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a  
specified

function as successive arguments.

Thus the first arg is form, your input string.  The second arg is  
the

regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed  
by

digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the  
portions

within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF - strapply(form,
 ([A-Z][a-z]*)(\\d*),
 ~ c(..1, if (nchar(..2)) ..2 else 1),
 simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
FALSE))
DF[[2]] - as.numeric(DF[[2]])

DF looks like this:


DF

V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drop column from a data frame

2010-12-26 Thread jim holtman
assign NULL to the column:

 dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
 dfxyz
x  y z
1   1 11 0
2   2 12 0
3   3 13 0
4   4 14 0
5   5 15 0
6   6 16 1
7   7 17 1
8   8 18 1
9   9 19 1
10 10 20 1
 dfxyz$y - NULL
 dfxyz
x z
1   1 0
2   2 0
3   3 0
4   4 0
5   5 0
6   6 1
7   7 1
8   8 1
9   9 1
10 10 1



On Sun, Dec 26, 2010 at 8:22 PM, John Sorkin
jsor...@grecc.umaryland.edu wrote:
 I am trying to drop a column of a data frame. The code below attempts to drop 
 a numeric column (which does not work but gives no error or warning) and a 
 factor column (which does not work but gives an error).
 I would appreciate someone telling me why my code does not work, and 
 suggesting code that will work.
 Thanks,
 John

 rm(dfxyz,dfxz,dfxy)

 # create the data frame.
 dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
 dfxyz

 names(dfxyz)

 # try to drop y column
 # does not work, does not produce error message
 dfxz - dfxyz[,-(dfxyz$y)]
 dfxz

 # try to drop z column
 # does not work, produces error message:
 # In Ops.factor(df$z) : - not meaningful for factors
 dfxy - dfxyz[,-dfxyz$z]
 dfxy



 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Confidentiality Statement:
 This email message, including any attachments, is for ...{{dropped:17}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Mike Marchywka

I think the OP had a very limited need but there is something
more sophisticated that may be of larger insterest called SMILES
which attempts to capture some structural information about a molecule
in a text sting. Reducing pictures to tractable text is an important step
in many analysis efforts and i was curious what others may be able to say about
R support for things like this.

A quick google search turned up this, 

http://cran.r-project.org/web/packages/rpubchem/rpubchem.pdf

but I wasn't sure if there are more packages for manipulating
different ball and stick collections( the atom and bond descriptions
could just as easily represent any other collection of nodes
and connections).

You can get some idea what this does by typing your favorite chemical
name here,

http://pubchem.ncbi.nlm.nih.gov/

and the entries give something called Canonical SMILES structures
For example, 

http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=8030loc=ec_rcs


UPAC Name: thiophene
Canonical SMILES: C1=CSC=C1
InChI: InChI=1S/C4H4S/c1-2-4-5-3-1/h1-4H
InChIKey: YTPLMLYBLZKORZ-UHFFFAOYSA-N [Click for Info] 


 From: han...@depauw.edu
 To: ggrothendi...@gmail.com
 Date: Sun, 26 Dec 2010 20:01:45 -0500
 CC: r-h...@stat.math.ethz.ch
 Subject: Re: [R] Parsing a Simple Chemical Formula

 Well let me just say thanks and WOW! Four great ideas, each worthy of
 study and I'll learn several things from each. Interestingly, these
 solutions seem more general and more compact than the solutions I
 found on the 'net using python and perl. More evidence for the power
 of R! A big thanks to each of you! Bryan

 On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

  On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson 
  wrote:
  Hello R Folks...
 
  I've been looking around the 'net and I see many complex solutions in
  various languages to this question, but I have a pretty simple need
  (and I'm
  not much good at regex). I want to use a chemical formula as a
  function
  argument. The formula would be in Hill order which is to list C,
  then H,
  then all other elements in alphabetical order. My example will
  have only a
  limited number of elements, few enough that one can search directly
  for each
  element. So some examples would be C5H12, or C5H12O or C5H11BrO
  (note that
  for oxygen and bromine, O or Br, there is no following number
  meaning a 1 is
  implied).
 
  Let's say
 
  form - C5H11BrO
 
  I'd like to get the count of each element, so in this case I need
  to extract
  C and 5, H and 11, Br and 1, O and 1 (I want to calculate the
  molecular
  weight by mulitplying). Sounds pretty simple, but my experiments
  with grep
  and strsplit don't immediately clue me into an obvious solution.
  As I said,
  I don't need a general solution to the problem of calculating
  molecular
  weight from an arbitrary formula, that seems quite challenging,
  just a way
  to convert form into a list or data frame which I can then do the
  math on.
 
  Here's hoping this is a simple issue for more experienced R users!
  TIA,
 
  This can be done by strapply in gsubfn. It matches the regular
  expression to the target string passing the back references (the
  parenthesized portions of the regular expression) through a specified
  function as successive arguments.
 
  Thus the first arg is form, your input string. The second arg is the
  regular expression which matches an upper case letter optionally
  followed by lower case letters and all that is optionally followed by
  digits. The third arg is a function shown in a formula
  representation. strapply passes the back references (i.e. the portions
  within parentheses) to the function as the two arguments. Finally
  simplify is another function in formula notation which turns the
  result into a matrix and then a data frame. Finally we make the
  second column of the data frame numeric.
 
  library(gsubfn)
 
  DF - strapply(form,
  ([A-Z][a-z]*)(\\d*),
  ~ c(..1, if (nchar(..2)) ..2 else 1),
  simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
  FALSE))
  DF[[2]] - as.numeric(DF[[2]])
 
  DF looks like this:
 
  DF
  V1 V2
  1 C 5
  2 H 11
  3 Br 1
  4 O 1
 
 
 
  --
  Statistics  Software Consulting
  GKX Group, GKX Associates Inc.
  tel: 1-877-GKX-GROUP
  email: ggrothendieck at gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:

Thanks Spencer, I'll definitely have a look at this package and it's  
vignettes.  I believe I have looked at it before, but didn't catch  
it on this particular search.  Bryan


Using the thermo list that the makeup function accesses to get its  
valid atomic symbols one can arrive at the the answer you posited  
would be too difficult in you first posting, the atomic weight from  
the formulae:


 str(thermo$element)
'data.frame':   130 obs. of  6 variables:
 $ element: chr  Z O H He ...
 $ state  : chr  aq gas gas gas ...
 $ source : chr  CWM89 CWM89 CWM89 CWM89 ...
 $ mass   : num  0 16 1.01 4 20.18 ...
 $ s  : num  -15.6 49 31.2 30.2 35 ...
 $ n  : int  1 2 2 1 1 1 1 1 2 2 ...

patts - paste(^, rownames(makeup(form)), $, sep=)
makuform- makeup(form)
makuform$amass - sapply(patts, function(x) {return( thermo 
$element[ grep(x, thermo$element[[1]])[1], mass])}  )

sum(makuform$amass *makuform$count)
# [1] 167.0457



On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.   
I have not looked at these vignettes, but most vignettes provide  
excellent introductions (though rarely with complete coverage) of  
important capabilities of the package.  (The 'sos' package includes  
a vignette, which exposes more capabilities than the example below.)



##
Have you considered the 'CHNOSZ' package?



makeup(C5H11BrO )

 count
C  5
H 11
Br 1
O  1


I found this using the 'sos' package as follows:


library(sos)
cf - ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


The print method for cf opened the results in a web browser,  
which showed that the CHNOSZ package had 14 of these 11 matches,  
and the other 7 were in 7 different packages.  Moreover, the  
CHNOSZ package is devoted to Chemical Thermodynamics and  
Activity Diagrams and provides many more capabilities that might  
interest you.



Hope this helps.
Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each  
worthy of

study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the  
power

of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu  
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex  
solutions in
various languages to this question, but I have a pretty simple  
need

(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in Hill order which is to list  
C,

then H,
then all other elements in alphabetical order.  My example will  
have

only a
limited number of elements, few enough that one can search  
directly

for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form - C5H11BrO


I'd like to get the count of each element, so in this case I  
need to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious  
solution.  As

I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just

a way
to convert form into a list or data frame which I can then do  
the

math on.

Here's hoping this is a simple issue for more experienced R users!
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a  
specified

function as successive arguments.

Thus the first arg is form, your input string.  The second arg is  
the

regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally  
followed by

digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the  
portions

within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF - strapply(form,
([A-Z][a-z]*)(\\d*),
~ c(..1, if (nchar(..2)) ..2 else 1),
simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
FALSE))
DF[[2]] - as.numeric(DF[[2]])

DF looks like this:


DF

V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 

[R] filled.contour colors

2010-12-26 Thread randhindi

Hi,

I am trying to set the color scale in filled.contour based on a specific
value instead of a relative position.
Specifically, I want the values below 0 to be in a gradient of green, and
those above 0 to be red. 0 would be white.

I tried:

posZero = abs(min(z)) / (abs(min(z)) + max(z));
filed.contour(..., col = designer.colors(n=30, col=c(green, white,
red), x=c(0, posZero, 1)))

but it does not center the white on the zero.

Thanks for your help,

Rand
-- 
View this message in context: 
http://r.789695.n4.nabble.com/filled-contour-colors-tp3164639p3164639.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson

Hi David  others...

I did find the function you recommended, plus, it's even easier (but a  
little hidden in the doc): element(form, mass).  But, this uses the  
atomic masses from the periodic table, which are weighted averages of  
the isotopes of each element.  What I'm doing actually involves mass  
spectrometry, so I need the isotope masses, which are integers (think  
12C, 13C, 14C, but the periodic table says 12.011 reflecting the  
relative abundances).  I used Gabor's solution and got my little  
function humming.  Plus, I have several things to read through from  
the various recommendations.


Thanks again, Bryan

On Dec 26, 2010, at 10:21 PM, David Winsemius wrote:



On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:

Thanks Spencer, I'll definitely have a look at this package and  
it's vignettes.  I believe I have looked at it before, but didn't  
catch it on this particular search.  Bryan


Using the thermo list that the makeup function accesses to get its  
valid atomic symbols one can arrive at the the answer you posited  
would be too difficult in you first posting, the atomic weight from  
the formulae:


 str(thermo$element)
'data.frame':   130 obs. of  6 variables:
$ element: chr  Z O H He ...
$ state  : chr  aq gas gas gas ...
$ source : chr  CWM89 CWM89 CWM89 CWM89 ...
$ mass   : num  0 16 1.01 4 20.18 ...
$ s  : num  -15.6 49 31.2 30.2 35 ...
$ n  : int  1 2 2 1 1 1 1 1 2 2 ...

patts - paste(^, rownames(makeup(form)), $, sep=)
makuform- makeup(form)
makuform$amass - sapply(patts, function(x) {return( thermo 
$element[ grep(x, thermo$element[[1]])[1], mass])}  )

sum(makuform$amass *makuform$count)
# [1] 167.0457



On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.   
I have not looked at these vignettes, but most vignettes provide  
excellent introductions (though rarely with complete coverage) of  
important capabilities of the package.  (The 'sos' package  
includes a vignette, which exposes more capabilities than the  
example below.)



##
   Have you considered the 'CHNOSZ' package?



makeup(C5H11BrO )

count
C  5
H 11
Br 1
O  1


   I found this using the 'sos' package as follows:


library(sos)
cf - ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


   The print method for cf opened the results in a web browser,  
which showed that the CHNOSZ package had 14 of these 11 matches,  
and the other 7 were in 7 different packages.  Moreover, the  
CHNOSZ package is devoted to Chemical Thermodynamics and  
Activity Diagrams and provides many more capabilities that might  
interest you.



   Hope this helps.
   Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each  
worthy of
study and I'll learn several things from each.  Interestingly,  
these

solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the  
power

of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  
han...@depauw.edu wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex  
solutions in
various languages to this question, but I have a pretty simple  
need

(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in Hill order which is to  
list C,

then H,
then all other elements in alphabetical order.  My example will  
have

only a
limited number of elements, few enough that one can search  
directly

for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form - C5H11BrO


I'd like to get the count of each element, so in this case I  
need to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious  
solution.  As

I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just

a way
to convert form into a list or data frame which I can then do  
the

math on.

Here's hoping this is a simple issue for more experienced R  
users!

TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a  
specified

function as successive arguments.

Thus the first arg is form, your input string.  The second arg  
is the

regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally  
followed by

digits.  The third arg is a function 

Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Gabor Grothendieck
On Sun, Dec 26, 2010 at 7:26 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu wrote:
 Hello R Folks...

 I've been looking around the 'net and I see many complex solutions in
 various languages to this question, but I have a pretty simple need (and I'm
 not much good at regex).  I want to use a chemical formula as a function
 argument.  The formula would be in Hill order which is to list C, then H,
 then all other elements in alphabetical order.  My example will have only a
 limited number of elements, few enough that one can search directly for each
 element.  So some examples would be C5H12, or C5H12O or C5H11BrO (note that
 for oxygen and bromine, O or Br, there is no following number meaning a 1 is
 implied).

 Let's say

 form - C5H11BrO

 I'd like to get the count of each element, so in this case I need to extract
 C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
 weight by mulitplying).  Sounds pretty simple, but my experiments with grep
 and strsplit don't immediately clue me into an obvious solution.  As I said,
 I don't need a general solution to the problem of calculating molecular
 weight from an arbitrary formula, that seems quite challenging, just a way
 to convert form into a list or data frame which I can then do the math on.

 Here's hoping this is a simple issue for more experienced R users!  TIA,

 This can be done by strapply in gsubfn.  It matches the regular
 expression to the target string passing the back references (the
 parenthesized portions of the regular expression) through a specified
 function as successive arguments.

 Thus the first arg is form, your input string.  The second arg is the
 regular expression which matches an upper case letter optionally
 followed by lower case letters and all that is optionally followed by
 digits.  The third arg is a function shown in a formula
 representation. strapply passes the back references (i.e. the portions
 within parentheses) to the function as the two arguments.  Finally
 simplify is another function in formula notation which turns the
 result into a matrix and then a data frame.  Finally we make the
 second column of the data frame numeric.

 library(gsubfn)

 DF - strapply(form,
   ([A-Z][a-z]*)(\\d*),
   ~ c(..1, if (nchar(..2)) ..2 else 1),
   simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE))
 DF[[2]] - as.numeric(DF[[2]])

 DF looks like this:

 DF
  V1 V2
 1  C  5
 2  H 11
 3 Br  1
 4  O  1


Here is a variation that is slightly simpler. The function in the
third argument has been changed from c to paste so that it outputs
strings like C 5.  With this form of output we can use read.table to
read it directly creating a data frame.

 strapply(form,
+   ([A-Z][a-z]*)(\\d*),
+   ~ paste(..1, if (nchar(..2)) ..2 else 1),
+   simplify = ~ read.table(textConnection(..1)))
  V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Spencer Graves
  Mike Marchywka's post mentioned a CRAN package, rpubchem, 
missed by my search for chemical formula.  A further search for 
chemical and chemistry still missed it.  compound found it.  
Adding compounds and combining them with union produced a list of 
564 links in 219 packages;  7 of the help pages were for rpubchem.  
The package with the most matches is seacarb (seawater carbonate 
chemistry with R:  21 matches), followed by CHNOSZ, previously 
mentioned (19 matches).   rpubchem is the 22nd package on this list (5 
matches, with a max score of 32, less than the max score of 2 other 
packages with 5 matches).



  Spencer


On 12/26/2010 7:36 PM, Bryan Hanson wrote:

Hi David  others...

I did find the function you recommended, plus, it's even easier (but a 
little hidden in the doc): element(form, mass).  But, this uses the 
atomic masses from the periodic table, which are weighted averages of 
the isotopes of each element.  What I'm doing actually involves mass 
spectrometry, so I need the isotope masses, which are integers (think 
12C, 13C, 14C, but the periodic table says 12.011 reflecting the 
relative abundances).  I used Gabor's solution and got my little 
function humming.  Plus, I have several things to read through from 
the various recommendations.


Thanks again, Bryan

On Dec 26, 2010, at 10:21 PM, David Winsemius wrote:



On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:

Thanks Spencer, I'll definitely have a look at this package and it's 
vignettes.  I believe I have looked at it before, but didn't catch 
it on this particular search.  Bryan


Using the thermo list that the makeup function accesses to get its 
valid atomic symbols one can arrive at the the answer you posited 
would be too difficult in you first posting, the atomic weight from 
the formulae:


 str(thermo$element)
'data.frame':130 obs. of  6 variables:
$ element: chr  Z O H He ...
$ state  : chr  aq gas gas gas ...
$ source : chr  CWM89 CWM89 CWM89 CWM89 ...
$ mass   : num  0 16 1.01 4 20.18 ...
$ s  : num  -15.6 49 31.2 30.2 35 ...
$ n  : int  1 2 2 1 1 1 1 1 2 2 ...

patts - paste(^, rownames(makeup(form)), $, sep=)
makuform- makeup(form)
makuform$amass - sapply(patts, function(x) {return( thermo$element[ 
grep(x, thermo$element[[1]])[1], mass])}  )

sum(makuform$amass *makuform$count)
# [1] 167.0457



On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.  
I have not looked at these vignettes, but most vignettes provide 
excellent introductions (though rarely with complete coverage) of 
important capabilities of the package.  (The 'sos' package includes 
a vignette, which exposes more capabilities than the example below.)



##
   Have you considered the 'CHNOSZ' package?



makeup(C5H11BrO )

count
C  5
H 11
Br 1
O  1


   I found this using the 'sos' package as follows:


library(sos)
cf - ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


   The print method for cf opened the results in a web browser, 
which showed that the CHNOSZ package had 14 of these 11 matches, 
and the other 7 were in 7 different packages.  Moreover, the 
CHNOSZ package is devoted to Chemical Thermodynamics and 
Activity Diagrams and provides many more capabilities that might 
interest you.



   Hope this helps.
   Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each 
worthy of

study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the power
of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson han...@depauw.edu 
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex 
solutions in

various languages to this question, but I have a pretty simple need
(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in Hill order which is to list C,
then H,
then all other elements in alphabetical order.  My example will 
have

only a
limited number of elements, few enough that one can search directly
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form - C5H11BrO


I'd like to get the count of each element, so in this case I 
need to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the 
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious 
solution.  As

I said,
I don't need a general solution to the problem of calculating 
molecular
weight from an arbitrary formula, that seems quite challenging, 
just

a way

[R] modifying user agent strings in http requests

2010-12-26 Thread Soumendra
Hi all.

How does one change user agent strings in http requests made in R? And
how do I figure out what my current user agent string looks like?

Thanks in advance,

Soumendra

--
Don't worry about people stealing your ideas. If your ideas are any
good, you'll have to ram them down people's throats.
                                                           ---  Howard Aiken

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Dennis Murphy
Hi:


On Sun, Dec 26, 2010 at 4:18 AM, bbslover dlu...@yeah.net wrote:


 Dear all,

 My double for loop as follows, but it is little efficient, I hope all
 friends can give me a vectorized program to replace my code. thanks

 x: is a matrix  202*263,  that is 202 samples, and 263 independent
 variables

 num.compd-nrow(x); # number of compounds
 diss.all-0
 for( i in 1:num.compd)
   for (j in 1:num.compd)
  if (i!=j) {


Isn't this just X'X?

S1-sum(x[i,]*x[j,])

Aren't each of S2 and S3 just diag(X'X)?

S2-sum(x[i,]^2)

   S3-sum(x[j,]^2)
sim2-S1/(S2+S3-S1)
diss2-1-sim2
diss.all-diss.all+diss2}


I tried
s1 - crossprod(x)
s2 - diag(s1)
s3 -outer(s2, s2, '+') - s1
s1/s3

This yields a symmetric matrix with 1's along the diagonal and quantities
between 0 and 1 in the off-diagonal. Something like it could conceivably be
used as a similarity matrix. Is that what you're looking for with sim2?

I agree with Berend: it looks like a problem that could be easily solved
with some matrix algebra. R can do matrix algebra quite efficiently,
y'know...

(BTW, I tried this on a 1000 x 1000 input matrix:
system.time(myfunc(x))
   user  system elapsed
   0.990.021.02

I expect it could be improved by an order of magnitude if one actually knew
what you were computing... )

HTH,
Dennis

it will cost a long time to finish this computation! i really need rapid
 code to replace my code.

 thanks

 kevin


 --
 View this message in context:
 http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drop column from a data frame

2010-12-26 Thread Phil Spector

John -
   You can use a syntax similar to what you've tried with
the select= argument of the subset function:


subset(dfxyz,select=-y)

x z
1   1 0
2   2 0
  . . .

subset(dfxyz,select=-z)

x  y
1   1 11
2   2 12
  . . .


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Sun, 26 Dec 2010, John Sorkin wrote:


I am trying to drop a column of a data frame. The code below attempts to drop a 
numeric column (which does not work but gives no error or warning) and a factor 
column (which does not work but gives an error).
I would appreciate someone telling me why my code does not work, and suggesting 
code that will work.
Thanks,
John

rm(dfxyz,dfxz,dfxy)

# create the data frame.
dfxyz - data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
dfxyz

names(dfxyz)

# try to drop y column
# does not work, does not produce error message
dfxz - dfxyz[,-(dfxyz$y)]
dfxz

# try to drop z column
# does not work, produces error message:
# In Ops.factor(df$z) : - not meaningful for factors
dfxy - dfxyz[,-dfxyz$z]
dfxy



John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R2WinBugs data import error

2010-12-26 Thread unsown

You solved my problem, thank you.

As you said it's the type of the content in the matrix that caused the
problem. 
I needed to put variable x along with other variables to the list, somehow
it turned out that x must be used in form of character in the statement:
dat - list(x,otherVariables)

Anyway, my codes work well now. Thanks for your help.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R2WinBugs-data-import-error-tp3164106p3164707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package update

2010-12-26 Thread eric

I'm running Linux Ubuntu and tried to update my packages using the
update.package() command. It appeared to download the updates ok but I got
the following message:


The downloaded packages are in ‘/tmp/RtmpFM82Ry/downloaded_packages’
Warning in install.packages(update[instlib == l, Package], l, contriburl =
contriburl,  :
  'lib = /usr/lib/R/site-library' is not writable
Error in install.packages(update[instlib == l, Package], l, contriburl =
contriburl,  : 
  unable to install packages
Calls: update.packages - install.packages

What does this mean ? And more importantly, how do I address it ?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/package-update-tp3164690p3164690.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

thanks for your help, it is great. In addition, In the beginning, the format
of x is dataframe, and i run my code, it is so slow, after your help, I
change x for matirx, it is so quick. I am very grateful your kind help, and
your code is so good!

kevin
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

thanks for your help. I am sorry I do not full understand your code, so i can
not correct using your code to my data. here is the attachment of my data,
and what I want to compute is the equation in the word document of the
attachment:

the code form Berend can get the answer i want to get.

http://r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package update

2010-12-26 Thread Joshua Wiley
Either switch the library path to a writable directory or run it as a su or 
sudo so you have the necessary permissions.

Cheers,

Josh

On Dec 26, 2010, at 20:45, eric ericst...@aol.com wrote:

 
 I'm running Linux Ubuntu and tried to update my packages using the
 update.package() command. It appeared to download the updates ok but I got
 the following message:
 
 
 The downloaded packages are in ‘/tmp/RtmpFM82Ry/downloaded_packages’
 Warning in install.packages(update[instlib == l, Package], l, contriburl =
 contriburl,  :
  'lib = /usr/lib/R/site-library' is not writable
 Error in install.packages(update[instlib == l, Package], l, contriburl =
 contriburl,  : 
  unable to install packages
 Calls: update.packages - install.packages
 
 What does this mean ? And more importantly, how do I address it ?
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/package-update-tp3164690p3164690.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Marius Hofert
Dear Peter,

thank you very much, *precisely* what I was looking for!

Cheers,

Marius

On 2010-12-27, at 02:27 , Peter Ehlers wrote:

 On 2010-12-26 08:26, Marius Hofert wrote:
 Dear David,
 
 thank you for your answer.
 As I wrote, I am looking for an option to control the *space* between the 
 tick marks and the corresponding labels. I am happy with the *number* of 
 tick marks and their default values. As far as I know, pscales can't control 
 the space, so it is *not* what I am looking for.
 
 Marius,
 I think that you mean something like the following:
 
 U - matrix(runif(300), ncol = 3)
 splom(U, par.settings = list(
axis.components = list(
left = list(pad1 = 3)
)
  )
 )
 
 which will adjust the left axis; you'll have to add
 right, top, bottom components to handle those as well.
 
 Have a look at what trellis.par.get() produces and
 check the axis.components section.
 
 Peter Ehlers
 
 
 Cheers,
 
 Marius
 
 On 2010-12-26, at 14:36 , David Winsemius wrote:
 
 
 On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:
 
 Dear expeRts,
 
 how can I decrease the space between the tick marks and the corresponding 
 labels in an splom?
 See here:
 
 library(lattice)
 U- matrix(runif(4000), ncol = 8)
 splom(U, axis.text.cex = 0.2) # =  space between the [small] tick labels 
 and tick marks is/seems to be too large
 
 So you want more tick marks?
 
 
 I checked ?panel.pairs but could not find an option for that.
 
 What about the pscales argument?
 
 A single number would increase the number of ticks, or a list with at and 
 labels values can be passed. Seem to be just what you asked for.
 
 --
 
 David Winsemius, MD
 West Hartford, CT
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Berend Hasselman


djmuseR wrote:
 
 On Sun, Dec 26, 2010 at 4:18 AM, bbslover dlu...@yeah.net wrote:
 

 x: is a matrix  202*263,  that is 202 samples, and 263 independent
 variables

 num.compd-nrow(x); # number of compounds
 diss.all-0
 for( i in 1:num.compd)
   for (j in 1:num.compd)
  if (i!=j) {

 
 Isn't this just X'X?
 
S1-sum(x[i,]*x[j,])

 Aren't each of S2 and S3 just diag(X'X)?
 
S2-sum(x[i,]^2)

S3-sum(x[j,]^2)
sim2-S1/(S2+S3-S1)
diss2-1-sim2
diss.all-diss.all+diss2}

 
 I tried
 s1 - crossprod(x)
 s2 - diag(s1)
 s3 -outer(s2, s2, '+') - s1
 s1/s3
 
 This yields a symmetric matrix with 1's along the diagonal and quantities
 between 0 and 1 in the off-diagonal. Something like it could conceivably
 be
 used as a similarity matrix. Is that what you're looking for with sim2?
 
 I agree with Berend: it looks like a problem that could be easily solved
 with some matrix algebra. R can do matrix algebra quite efficiently,
 y'know...
 
 (BTW, I tried this on a 1000 x 1000 input matrix:
 system.time(myfunc(x))
user  system elapsed
0.990.021.02
 
 I expect it could be improved by an order of magnitude if one actually
 knew
 what you were computing... )
 

I did some more work along Dennis' lines

xtx - tcrossprod(x)
xtd - diag(xtx)
xzz - outer(xtd,xtd,'+')
zz  - 1 - xtx/(xzz-xtx)
diss.all - sum(zz)

this appears to give the desired result and it's quite a bit faster than my
alternative 2.
It would indeed be nice to know what is being computed.

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164755.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.