Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Berend Hasselman


djmuseR wrote:
> 
> On Sun, Dec 26, 2010 at 4:18 AM, bbslover  wrote:
> 
>>
>> x: is a matrix  202*263,  that is 202 samples, and 263 independent
>> variables
>>
>> num.compd<-nrow(x); # number of compounds
>> diss.all<-0
>> for( i in 1:num.compd)
>>   for (j in 1:num.compd)
>>  if (i!=j) {
>>
> 
> Isn't this just X'X?
> 
>>S1<-sum(x[i,]*x[j,])
>>
> Aren't each of S2 and S3 just diag(X'X)?
> 
>>S2<-sum(x[i,]^2)
>>
>S3<-sum(x[j,]^2)
>>sim2<-S1/(S2+S3-S1)
>>diss2<-1-sim2
>>diss.all<-diss.all+diss2}
>>
> 
> I tried
> s1 <- crossprod(x)
> s2 <- diag(s1)
> s3 <-outer(s2, s2, '+') - s1
> s1/s3
> 
> This yields a symmetric matrix with 1's along the diagonal and quantities
> between 0 and 1 in the off-diagonal. Something like it could conceivably
> be
> used as a similarity matrix. Is that what you're looking for with sim2?
> 
> I agree with Berend: it looks like a problem that could be easily solved
> with some matrix algebra. R can do matrix algebra quite efficiently,
> y'know...
> 
> (BTW, I tried this on a 1000 x 1000 input matrix:
> system.time(myfunc(x))
>user  system elapsed
>0.990.021.02
> 
> I expect it could be improved by an order of magnitude if one actually
> knew
> what you were computing... )
> 

I did some more work along Dennis' lines

xtx <- tcrossprod(x)
xtd <- diag(xtx)
xzz <- outer(xtd,xtd,'+')
zz  <- 1 - xtx/(xzz-xtx)
diss.all <- sum(zz)

this appears to give the desired result and it's quite a bit faster than my
alternative 2.
It would indeed be nice to know what is being computed.

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164755.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Marius Hofert
Dear Peter,

thank you very much, *precisely* what I was looking for!

Cheers,

Marius

On 2010-12-27, at 02:27 , Peter Ehlers wrote:

> On 2010-12-26 08:26, Marius Hofert wrote:
>> Dear David,
>> 
>> thank you for your answer.
>> As I wrote, I am looking for an option to control the *space* between the 
>> tick marks and the corresponding labels. I am happy with the *number* of 
>> tick marks and their default values. As far as I know, pscales can't control 
>> the space, so it is *not* what I am looking for.
> 
> Marius,
> I think that you mean something like the following:
> 
> U <- matrix(runif(300), ncol = 3)
> splom(U, par.settings = list(
>axis.components = list(
>left = list(pad1 = 3)
>)
>  )
> )
> 
> which will adjust the left axis; you'll have to add
> right, top, bottom components to handle those as well.
> 
> Have a look at what trellis.par.get() produces and
> check the axis.components section.
> 
> Peter Ehlers
> 
> 
>> Cheers,
>> 
>> Marius
>> 
>> On 2010-12-26, at 14:36 , David Winsemius wrote:
>> 
>>> 
>>> On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:
>>> 
 Dear expeRts,
 
 how can I decrease the space between the tick marks and the corresponding 
 labels in an splom?
 See here:
 
 library(lattice)
 U<- matrix(runif(4000), ncol = 8)
 splom(U, axis.text.cex = 0.2) # =>  space between the [small] tick labels 
 and tick marks is/seems to be too large
>>> 
>>> So you want more tick marks?
>>> 
 
 I checked ?panel.pairs but could not find an option for that.
>>> 
>>> What about the pscales argument?
>>> 
>>> A single number would increase the number of ticks, or a list with "at" and 
>>> "labels" values can be passed. Seem to be just what you asked for.
>>> 
>>> --
>>> 
>>> David Winsemius, MD
>>> West Hartford, CT
>>> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package update

2010-12-26 Thread Joshua Wiley
Either switch the library path to a writable directory or run it as a su or 
sudo so you have the necessary permissions.

Cheers,

Josh

On Dec 26, 2010, at 20:45, eric  wrote:

> 
> I'm running Linux Ubuntu and tried to update my packages using the
> update.package() command. It appeared to download the updates ok but I got
> the following message:
> 
> 
> The downloaded packages are in ‘/tmp/RtmpFM82Ry/downloaded_packages’
> Warning in install.packages(update[instlib == l, "Package"], l, contriburl =
> contriburl,  :
>  'lib = "/usr/lib/R/site-library"' is not writable
> Error in install.packages(update[instlib == l, "Package"], l, contriburl =
> contriburl,  : 
>  unable to install packages
> Calls: update.packages -> install.packages
> 
> What does this mean ? And more importantly, how do I address it ?
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/package-update-tp3164690p3164690.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

thanks for your help. I am sorry I do not full understand your code, so i can
not correct using your code to my data. here is the attachment of my data,
and what I want to compute is the equation in the word document of the
attachment:

the code form Berend can get the answer i want to get.

http://r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

thanks for your help, it is great. In addition, In the beginning, the format
of x is dataframe, and i run my code, it is so slow, after your help, I
change x for matirx, it is so quick. I am very grateful your kind help, and
your code is so good!

kevin
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R2WinBugs data import error

2010-12-26 Thread unsown

You solved my problem, thank you.

As you said it's the type of the content in the matrix that caused the
problem. 
I needed to put variable x along with other variables to the list, somehow
it turned out that x must be used in form of character in the statement:
dat <- list("x","otherVariables")

Anyway, my codes work well now. Thanks for your help.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R2WinBugs-data-import-error-tp3164106p3164707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package update

2010-12-26 Thread eric

I'm running Linux Ubuntu and tried to update my packages using the
update.package() command. It appeared to download the updates ok but I got
the following message:


The downloaded packages are in ‘/tmp/RtmpFM82Ry/downloaded_packages’
Warning in install.packages(update[instlib == l, "Package"], l, contriburl =
contriburl,  :
  'lib = "/usr/lib/R/site-library"' is not writable
Error in install.packages(update[instlib == l, "Package"], l, contriburl =
contriburl,  : 
  unable to install packages
Calls: update.packages -> install.packages

What does this mean ? And more importantly, how do I address it ?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/package-update-tp3164690p3164690.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drop column from a data frame

2010-12-26 Thread Phil Spector

John -
   You can use a syntax similar to what you've tried with
the select= argument of the subset function:


subset(dfxyz,select=-y)

x z
1   1 0
2   2 0
  . . .

subset(dfxyz,select=-z)

x  y
1   1 11
2   2 12
  . . .


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Sun, 26 Dec 2010, John Sorkin wrote:


I am trying to drop a column of a data frame. The code below attempts to drop a 
numeric column (which does not work but gives no error or warning) and a factor 
column (which does not work but gives an error).
I would appreciate someone telling me why my code does not work, and suggesting 
code that will work.
Thanks,
John

rm(dfxyz,dfxz,dfxy)

# create the data frame.
dfxyz <- data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
dfxyz

names(dfxyz)

# try to drop y column
# does not work, does not produce error message
dfxz <- dfxyz[,-(dfxyz$y)]
dfxz

# try to drop z column
# does not work, produces error message:
# In Ops.factor(df$z) : - not meaningful for factors
dfxy <- dfxyz[,-dfxyz$z]
dfxy



John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Dennis Murphy
Hi:


On Sun, Dec 26, 2010 at 4:18 AM, bbslover  wrote:

>
> Dear all,
>
> My double for loop as follows, but it is little efficient, I hope all
> friends can give me a "vectorized" program to replace my code. thanks
>
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
>
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>   for (j in 1:num.compd)
>  if (i!=j) {
>

Isn't this just X'X?

>S1<-sum(x[i,]*x[j,])
>
Aren't each of S2 and S3 just diag(X'X)?

>S2<-sum(x[i,]^2)
>
   S3<-sum(x[j,]^2)
>sim2<-S1/(S2+S3-S1)
>diss2<-1-sim2
>diss.all<-diss.all+diss2}
>

I tried
s1 <- crossprod(x)
s2 <- diag(s1)
s3 <-outer(s2, s2, '+') - s1
s1/s3

This yields a symmetric matrix with 1's along the diagonal and quantities
between 0 and 1 in the off-diagonal. Something like it could conceivably be
used as a similarity matrix. Is that what you're looking for with sim2?

I agree with Berend: it looks like a problem that could be easily solved
with some matrix algebra. R can do matrix algebra quite efficiently,
y'know...

(BTW, I tried this on a 1000 x 1000 input matrix:
system.time(myfunc(x))
   user  system elapsed
   0.990.021.02

I expect it could be improved by an order of magnitude if one actually knew
what you were computing... )

HTH,
Dennis

it will cost a long time to finish this computation! i really need "rapid"
> code to replace my code.
>
> thanks
>
> kevin
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] modifying user agent strings in http requests

2010-12-26 Thread Soumendra
Hi all.

How does one change user agent strings in http requests made in R? And
how do I figure out what my current user agent string looks like?

Thanks in advance,

Soumendra

--
Don't worry about people stealing your ideas. If your ideas are any
good, you'll have to ram them down people's throats.
                                                           ---  Howard Aiken

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Spencer Graves
  Mike Marchywka's post mentioned a CRAN package, "rpubchem", 
missed by my search for "chemical formula".  A further search for 
"chemical" and "chemistry" still missed it.  "compound" found it.  
Adding "compounds" and combining them with "union" produced a list of 
564 links in 219 packages;  7 of the help pages were for "rpubchem".  
The package with the most matches is "seacarb" (seawater carbonate 
chemistry with R:  21 matches), followed by "CHNOSZ", previously 
mentioned (19 matches).  " rpubchem" is the 22nd package on this list (5 
matches, with a max score of 32, less than the max score of 2 other 
packages with 5 matches).



  Spencer


On 12/26/2010 7:36 PM, Bryan Hanson wrote:

Hi David & others...

I did find the function you recommended, plus, it's even easier (but a 
little hidden in the doc): >element(form, "mass").  But, this uses the 
atomic masses from the periodic table, which are weighted averages of 
the isotopes of each element.  What I'm doing actually involves mass 
spectrometry, so I need the isotope masses, which are integers (think 
12C, 13C, 14C, but the periodic table says 12.011 reflecting the 
relative abundances).  I used Gabor's solution and got my little 
function humming.  Plus, I have several things to read through from 
the various recommendations.


Thanks again, Bryan

On Dec 26, 2010, at 10:21 PM, David Winsemius wrote:



On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:

Thanks Spencer, I'll definitely have a look at this package and it's 
vignettes.  I believe I have looked at it before, but didn't catch 
it on this particular search.  Bryan


Using the thermo list that the makeup function accesses to get its 
valid atomic symbols one can arrive at the the answer you posited 
would be too difficult in you first posting, the atomic weight from 
the formulae:


> str(thermo$element)
'data.frame':130 obs. of  6 variables:
$ element: chr  "Z" "O" "H" "He" ...
$ state  : chr  "aq" "gas" "gas" "gas" ...
$ source : chr  "CWM89" "CWM89" "CWM89" "CWM89" ...
$ mass   : num  0 16 1.01 4 20.18 ...
$ s  : num  -15.6 49 31.2 30.2 35 ...
$ n  : int  1 2 2 1 1 1 1 1 2 2 ...

patts <- paste("^", rownames(makeup(form)), "$", sep="")
makuform<- makeup(form)
makuform$amass <- sapply(patts, function(x) {return( thermo$element[ 
grep(x, thermo$element[[1]])[1], "mass"])}  )

sum(makuform$amass *makuform$count)
# [1] 167.0457



On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.  
I have not looked at these vignettes, but most vignettes provide 
excellent introductions (though rarely with complete coverage) of 
important capabilities of the package.  (The 'sos' package includes 
a vignette, which exposes more capabilities than the example below.)



##
   Have you considered the 'CHNOSZ' package?



makeup("C5H11BrO" )

count
C  5
H 11
Br 1
O  1


   I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


   The print method for "cf" opened the results in a web browser, 
which showed that the "CHNOSZ" package had 14 of these 11 matches, 
and the other 7 were in 7 different packages.  Moreover, the 
"CHNOSZ" package is devoted to "Chemical Thermodynamics and 
Activity Diagrams" and provides many more capabilities that might 
interest you.



   Hope this helps.
   Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each 
worthy of

study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the power
of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex 
solutions in

various languages to this question, but I have a pretty simple need
(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in "Hill order" which is to list C,
then H,
then all other elements in alphabetical order.  My example will 
have

only a
limited number of elements, few enough that one can search directly
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form <- "C5H11BrO"


I'd like to get the count of each element, so in this case I 
need to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the 
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious 
solution.  As

I said,
I don't need a general solution to the problem of calculating 
molecular
weight from an a

Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Gabor Grothendieck
On Sun, Dec 26, 2010 at 7:26 PM, Gabor Grothendieck
 wrote:
> On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  wrote:
>> Hello R Folks...
>>
>> I've been looking around the 'net and I see many complex solutions in
>> various languages to this question, but I have a pretty simple need (and I'm
>> not much good at regex).  I want to use a chemical formula as a function
>> argument.  The formula would be in "Hill order" which is to list C, then H,
>> then all other elements in alphabetical order.  My example will have only a
>> limited number of elements, few enough that one can search directly for each
>> element.  So some examples would be C5H12, or C5H12O or C5H11BrO (note that
>> for oxygen and bromine, O or Br, there is no following number meaning a 1 is
>> implied).
>>
>> Let's say
>>
>>> form <- "C5H11BrO"
>>
>> I'd like to get the count of each element, so in this case I need to extract
>> C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
>> weight by mulitplying).  Sounds pretty simple, but my experiments with grep
>> and strsplit don't immediately clue me into an obvious solution.  As I said,
>> I don't need a general solution to the problem of calculating molecular
>> weight from an arbitrary formula, that seems quite challenging, just a way
>> to convert "form" into a list or data frame which I can then do the math on.
>>
>> Here's hoping this is a simple issue for more experienced R users!  TIA,
>
> This can be done by strapply in gsubfn.  It matches the regular
> expression to the target string passing the back references (the
> parenthesized portions of the regular expression) through a specified
> function as successive arguments.
>
> Thus the first arg is form, your input string.  The second arg is the
> regular expression which matches an upper case letter optionally
> followed by lower case letters and all that is optionally followed by
> digits.  The third arg is a function shown in a formula
> representation. strapply passes the back references (i.e. the portions
> within parentheses) to the function as the two arguments.  Finally
> simplify is another function in formula notation which turns the
> result into a matrix and then a data frame.  Finally we make the
> second column of the data frame numeric.
>
> library(gsubfn)
>
> DF <- strapply(form,
>   "([A-Z][a-z]*)(\\d*)",
>   ~ c(..1, if (nchar(..2)) ..2 else 1),
>   simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE))
> DF[[2]] <- as.numeric(DF[[2]])
>
> DF looks like this:
>
>> DF
>  V1 V2
> 1  C  5
> 2  H 11
> 3 Br  1
> 4  O  1
>

Here is a variation that is slightly simpler. The function in the
third argument has been changed from c to paste so that it outputs
strings like "C 5".  With this form of output we can use read.table to
read it directly creating a data frame.

> strapply(form,
+   "([A-Z][a-z]*)(\\d*)",
+   ~ paste(..1, if (nchar(..2)) ..2 else 1),
+   simplify = ~ read.table(textConnection(..1)))
  V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson

Hi David & others...

I did find the function you recommended, plus, it's even easier (but a  
little hidden in the doc): >element(form, "mass").  But, this uses the  
atomic masses from the periodic table, which are weighted averages of  
the isotopes of each element.  What I'm doing actually involves mass  
spectrometry, so I need the isotope masses, which are integers (think  
12C, 13C, 14C, but the periodic table says 12.011 reflecting the  
relative abundances).  I used Gabor's solution and got my little  
function humming.  Plus, I have several things to read through from  
the various recommendations.


Thanks again, Bryan

On Dec 26, 2010, at 10:21 PM, David Winsemius wrote:



On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:

Thanks Spencer, I'll definitely have a look at this package and  
it's vignettes.  I believe I have looked at it before, but didn't  
catch it on this particular search.  Bryan


Using the thermo list that the makeup function accesses to get its  
valid atomic symbols one can arrive at the the answer you posited  
would be too difficult in you first posting, the atomic weight from  
the formulae:


> str(thermo$element)
'data.frame':   130 obs. of  6 variables:
$ element: chr  "Z" "O" "H" "He" ...
$ state  : chr  "aq" "gas" "gas" "gas" ...
$ source : chr  "CWM89" "CWM89" "CWM89" "CWM89" ...
$ mass   : num  0 16 1.01 4 20.18 ...
$ s  : num  -15.6 49 31.2 30.2 35 ...
$ n  : int  1 2 2 1 1 1 1 1 2 2 ...

patts <- paste("^", rownames(makeup(form)), "$", sep="")
makuform<- makeup(form)
makuform$amass <- sapply(patts, function(x) {return( thermo 
$element[ grep(x, thermo$element[[1]])[1], "mass"])}  )

sum(makuform$amass *makuform$count)
# [1] 167.0457



On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.   
I have not looked at these vignettes, but most vignettes provide  
excellent introductions (though rarely with complete coverage) of  
important capabilities of the package.  (The 'sos' package  
includes a vignette, which exposes more capabilities than the  
example below.)



##
   Have you considered the 'CHNOSZ' package?



makeup("C5H11BrO" )

count
C  5
H 11
Br 1
O  1


   I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


   The print method for "cf" opened the results in a web browser,  
which showed that the "CHNOSZ" package had 14 of these 11 matches,  
and the other 7 were in 7 different packages.  Moreover, the  
"CHNOSZ" package is devoted to "Chemical Thermodynamics and  
Activity Diagrams" and provides many more capabilities that might  
interest you.



   Hope this helps.
   Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each  
worthy of
study and I'll learn several things from each.  Interestingly,  
these

solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the  
power

of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  
 wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex  
solutions in
various languages to this question, but I have a pretty simple  
need

(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in "Hill order" which is to  
list C,

then H,
then all other elements in alphabetical order.  My example will  
have

only a
limited number of elements, few enough that one can search  
directly

for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form <- "C5H11BrO"


I'd like to get the count of each element, so in this case I  
need to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious  
solution.  As

I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just

a way
to convert "form" into a list or data frame which I can then do  
the

math on.

Here's hoping this is a simple issue for more experienced R  
users!

TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a  
specified

function as successive arguments.

Thus the first arg is form, your input string.  The second arg  
is the

regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally  
followed 

[R] filled.contour colors

2010-12-26 Thread randhindi

Hi,

I am trying to set the color scale in filled.contour based on a specific
value instead of a relative position.
Specifically, I want the values below 0 to be in a gradient of green, and
those above 0 to be red. 0 would be white.

I tried:

posZero = abs(min(z)) / (abs(min(z)) + max(z));
filed.contour(..., col = designer.colors(n=30, col=c("green", "white",
"red"), x=c(0, posZero, 1)))

but it does not center the white on the zero.

Thanks for your help,

Rand
-- 
View this message in context: 
http://r.789695.n4.nabble.com/filled-contour-colors-tp3164639p3164639.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:

Thanks Spencer, I'll definitely have a look at this package and it's  
vignettes.  I believe I have looked at it before, but didn't catch  
it on this particular search.  Bryan


Using the thermo list that the makeup function accesses to get its  
valid atomic symbols one can arrive at the the answer you posited  
would be too difficult in you first posting, the atomic weight from  
the formulae:


> str(thermo$element)
'data.frame':   130 obs. of  6 variables:
 $ element: chr  "Z" "O" "H" "He" ...
 $ state  : chr  "aq" "gas" "gas" "gas" ...
 $ source : chr  "CWM89" "CWM89" "CWM89" "CWM89" ...
 $ mass   : num  0 16 1.01 4 20.18 ...
 $ s  : num  -15.6 49 31.2 30.2 35 ...
 $ n  : int  1 2 2 1 1 1 1 1 2 2 ...

patts <- paste("^", rownames(makeup(form)), "$", sep="")
makuform<- makeup(form)
makuform$amass <- sapply(patts, function(x) {return( thermo 
$element[ grep(x, thermo$element[[1]])[1], "mass"])}  )

sum(makuform$amass *makuform$count)
# [1] 167.0457



On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.   
I have not looked at these vignettes, but most vignettes provide  
excellent introductions (though rarely with complete coverage) of  
important capabilities of the package.  (The 'sos' package includes  
a vignette, which exposes more capabilities than the example below.)



##
Have you considered the 'CHNOSZ' package?



makeup("C5H11BrO" )

 count
C  5
H 11
Br 1
O  1


I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


The print method for "cf" opened the results in a web browser,  
which showed that the "CHNOSZ" package had 14 of these 11 matches,  
and the other 7 were in 7 different packages.  Moreover, the  
"CHNOSZ" package is devoted to "Chemical Thermodynamics and  
Activity Diagrams" and provides many more capabilities that might  
interest you.



Hope this helps.
Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each  
worthy of

study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the  
power

of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson   
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex  
solutions in
various languages to this question, but I have a pretty simple  
need

(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in "Hill order" which is to list  
C,

then H,
then all other elements in alphabetical order.  My example will  
have

only a
limited number of elements, few enough that one can search  
directly

for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form <- "C5H11BrO"


I'd like to get the count of each element, so in this case I  
need to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious  
solution.  As

I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just

a way
to convert "form" into a list or data frame which I can then do  
the

math on.

Here's hoping this is a simple issue for more experienced R users!
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a  
specified

function as successive arguments.

Thus the first arg is form, your input string.  The second arg is  
the

regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally  
followed by

digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the  
portions

within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
"([A-Z][a-z]*)(\\d*)",
~ c(..1, if (nchar(..2)) ..2 else 1),
simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
FALSE))
DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:


DF

V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics & Software Consulting
GKX Group

Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Mike Marchywka

I think the OP had a very limited need but there is something
more sophisticated that may be of larger insterest called "SMILES"
which attempts to capture some structural information about a molecule
in a text sting. Reducing pictures to tractable text is an important step
in many analysis efforts and i was curious what others may be able to say about
R support for things like this.

A quick google search turned up this, 

http://cran.r-project.org/web/packages/rpubchem/rpubchem.pdf

but I wasn't sure if there are more packages for manipulating
different ball and stick collections( the atom and bond descriptions
could just as easily represent any other collection of nodes
and connections).

You can get some idea what this does by typing your favorite chemical
name here,

http://pubchem.ncbi.nlm.nih.gov/

and the entries give something called "Canonical SMILES structures"
For example, 

http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=8030&loc=ec_rcs


UPAC Name: thiophene
Canonical SMILES: C1=CSC=C1
InChI: InChI=1S/C4H4S/c1-2-4-5-3-1/h1-4H
InChIKey: YTPLMLYBLZKORZ-UHFFFAOYSA-N [Click for Info] 


> From: han...@depauw.edu
> To: ggrothendi...@gmail.com
> Date: Sun, 26 Dec 2010 20:01:45 -0500
> CC: r-h...@stat.math.ethz.ch
> Subject: Re: [R] Parsing a Simple Chemical Formula
>
> Well let me just say thanks and WOW! Four great ideas, each worthy of
> study and I'll learn several things from each. Interestingly, these
> solutions seem more general and more compact than the solutions I
> found on the 'net using python and perl. More evidence for the power
> of R! A big thanks to each of you! Bryan
>
> On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:
>
> > On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson 
> > wrote:
> >> Hello R Folks...
> >>
> >> I've been looking around the 'net and I see many complex solutions in
> >> various languages to this question, but I have a pretty simple need
> >> (and I'm
> >> not much good at regex). I want to use a chemical formula as a
> >> function
> >> argument. The formula would be in "Hill order" which is to list C,
> >> then H,
> >> then all other elements in alphabetical order. My example will
> >> have only a
> >> limited number of elements, few enough that one can search directly
> >> for each
> >> element. So some examples would be C5H12, or C5H12O or C5H11BrO
> >> (note that
> >> for oxygen and bromine, O or Br, there is no following number
> >> meaning a 1 is
> >> implied).
> >>
> >> Let's say
> >>
> >>> form <- "C5H11BrO"
> >>
> >> I'd like to get the count of each element, so in this case I need
> >> to extract
> >> C and 5, H and 11, Br and 1, O and 1 (I want to calculate the
> >> molecular
> >> weight by mulitplying). Sounds pretty simple, but my experiments
> >> with grep
> >> and strsplit don't immediately clue me into an obvious solution.
> >> As I said,
> >> I don't need a general solution to the problem of calculating
> >> molecular
> >> weight from an arbitrary formula, that seems quite challenging,
> >> just a way
> >> to convert "form" into a list or data frame which I can then do the
> >> math on.
> >>
> >> Here's hoping this is a simple issue for more experienced R users!
> >> TIA,
> >
> > This can be done by strapply in gsubfn. It matches the regular
> > expression to the target string passing the back references (the
> > parenthesized portions of the regular expression) through a specified
> > function as successive arguments.
> >
> > Thus the first arg is form, your input string. The second arg is the
> > regular expression which matches an upper case letter optionally
> > followed by lower case letters and all that is optionally followed by
> > digits. The third arg is a function shown in a formula
> > representation. strapply passes the back references (i.e. the portions
> > within parentheses) to the function as the two arguments. Finally
> > simplify is another function in formula notation which turns the
> > result into a matrix and then a data frame. Finally we make the
> > second column of the data frame numeric.
> >
> > library(gsubfn)
> >
> > DF <- strapply(form,
> > "([A-Z][a-z]*)(\\d*)",
> > ~ c(..1, if (nchar(..2)) ..2 else 1),
> > simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
> > FALSE))
> > DF[[2]] <- as.numeric(DF[[2]])
> >
> > DF looks like this:
> >
> >> DF
> > V1 V2
> > 1 C 5
> > 2 H 11
> > 3 Br 1
> > 4 O 1
> >
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEA

Re: [R] Drop column from a data frame

2010-12-26 Thread jim holtman
assign NULL to the column:

> dfxyz <- data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
> dfxyz
x  y z
1   1 11 0
2   2 12 0
3   3 13 0
4   4 14 0
5   5 15 0
6   6 16 1
7   7 17 1
8   8 18 1
9   9 19 1
10 10 20 1
> dfxyz$y <- NULL
> dfxyz
x z
1   1 0
2   2 0
3   3 0
4   4 0
5   5 0
6   6 1
7   7 1
8   8 1
9   9 1
10 10 1
>


On Sun, Dec 26, 2010 at 8:22 PM, John Sorkin
 wrote:
> I am trying to drop a column of a data frame. The code below attempts to drop 
> a numeric column (which does not work but gives no error or warning) and a 
> factor column (which does not work but gives an error).
> I would appreciate someone telling me why my code does not work, and 
> suggesting code that will work.
> Thanks,
> John
>
> rm(dfxyz,dfxz,dfxy)
>
> # create the data frame.
> dfxyz <- data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
> dfxyz
>
> names(dfxyz)
>
> # try to drop y column
> # does not work, does not produce error message
> dfxz <- dfxyz[,-(dfxyz$y)]
> dfxz
>
> # try to drop z column
> # does not work, produces error message:
> # In Ops.factor(df$z) : - not meaningful for factors
> dfxy <- dfxyz[,-dfxyz$z]
> dfxy
>
>
>
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:17}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson
Thanks Spencer, I'll definitely have a look at this package and it's  
vignettes.  I believe I have looked at it before, but didn't catch it  
on this particular search.  Bryan


On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:

p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.  I  
have not looked at these vignettes, but most vignettes provide  
excellent introductions (though rarely with complete coverage) of  
important capabilities of the package.  (The 'sos' package includes  
a vignette, which exposes more capabilities than the example below.)



##
 Have you considered the 'CHNOSZ' package?



makeup("C5H11BrO" )

  count
C  5
H 11
Br 1
O  1


 I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


 The print method for "cf" opened the results in a web browser,  
which showed that the "CHNOSZ" package had 14 of these 11 matches,  
and the other 7 were in 7 different packages.  Moreover, the  
"CHNOSZ" package is devoted to "Chemical Thermodynamics and Activity  
Diagrams" and provides many more capabilities that might interest you.



 Hope this helps.
 Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each worthy  
of

study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the power
of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson   
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex  
solutions in

various languages to this question, but I have a pretty simple need
(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in "Hill order" which is to list C,
then H,
then all other elements in alphabetical order.  My example will  
have

only a
limited number of elements, few enough that one can search directly
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form <- "C5H11BrO"


I'd like to get the count of each element, so in this case I need  
to

extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular

weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious solution.   
As

I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just

a way
to convert "form" into a list or data frame which I can then do the
math on.

Here's hoping this is a simple issue for more experienced R users!
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a  
specified

function as successive arguments.

Thus the first arg is form, your input string.  The second arg is  
the

regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed  
by

digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the  
portions

within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
 "([A-Z][a-z]*)(\\d*)",
 ~ c(..1, if (nchar(..2)) ..2 else 1),
 simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
FALSE))
DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:


DF

V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Peter Ehlers

On 2010-12-26 08:26, Marius Hofert wrote:

Dear David,

thank you for your answer.
As I wrote, I am looking for an option to control the *space* between the tick 
marks and the corresponding labels. I am happy with the *number* of tick marks 
and their default values. As far as I know, pscales can't control the space, so 
it is *not* what I am looking for.


Marius,
I think that you mean something like the following:

 U <- matrix(runif(300), ncol = 3)
 splom(U, par.settings = list(
axis.components = list(
left = list(pad1 = 3)
)
  )
 )

which will adjust the left axis; you'll have to add
right, top, bottom components to handle those as well.

Have a look at what trellis.par.get() produces and
check the axis.components section.

Peter Ehlers



Cheers,

Marius

On 2010-12-26, at 14:36 , David Winsemius wrote:



On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:


Dear expeRts,

how can I decrease the space between the tick marks and the corresponding 
labels in an splom?
See here:

library(lattice)
U<- matrix(runif(4000), ncol = 8)
splom(U, axis.text.cex = 0.2) # =>  space between the [small] tick labels and 
tick marks is/seems to be too large


So you want more tick marks?



I checked ?panel.pairs but could not find an option for that.


What about the pscales argument?

A single number would increase the number of ticks, or a list with "at" and 
"labels" values can be passed. Seem to be just what you asked for.

--

David Winsemius, MD
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Drop column from a data frame

2010-12-26 Thread John Sorkin
I am trying to drop a column of a data frame. The code below attempts to drop a 
numeric column (which does not work but gives no error or warning) and a factor 
column (which does not work but gives an error).
I would appreciate someone telling me why my code does not work, and suggesting 
code that will work.
Thanks,
John

rm(dfxyz,dfxz,dfxy)

# create the data frame.
dfxyz <- data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5
dfxyz

names(dfxyz)

# try to drop y column
# does not work, does not produce error message
dfxz <- dfxyz[,-(dfxyz$y)]
dfxz

# try to drop z column
# does not work, produces error message:
# In Ops.factor(df$z) : - not meaningful for factors
dfxy <- dfxyz[,-dfxyz$z]
dfxy



John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Spencer Graves
p.s.  help(pac=CHNOSZ) reveals that this package has 3 vignettes.  I 
have not looked at these vignettes, but most vignettes provide excellent 
introductions (though rarely with complete coverage) of important 
capabilities of the package.  (The 'sos' package includes a vignette, 
which exposes more capabilities than the example below.)



##
  Have you considered the 'CHNOSZ' package?



makeup("C5H11BrO" )

   count
C  5
H 11
Br 1
O  1


  I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


  The print method for "cf" opened the results in a web browser, 
which showed that the "CHNOSZ" package had 14 of these 11 matches, and 
the other 7 were in 7 different packages.  Moreover, the "CHNOSZ" 
package is devoted to "Chemical Thermodynamics and Activity Diagrams" 
and provides many more capabilities that might interest you.



  Hope this helps.
  Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:

Well let me just say thanks and WOW!  Four great ideas, each worthy of
study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the power
of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:


On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in
various languages to this question, but I have a pretty simple need
(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in "Hill order" which is to list C,
then H,
then all other elements in alphabetical order.  My example will have
only a
limited number of elements, few enough that one can search directly
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say


form <- "C5H11BrO"


I'd like to get the count of each element, so in this case I need to
extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvious solution.  As
I said,
I don't need a general solution to the problem of calculating molecular
weight from an arbitrary formula, that seems quite challenging, just
a way
to convert "form" into a list or data frame which I can then do the
math on.

Here's hoping this is a simple issue for more experienced R users!
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
  "([A-Z][a-z]*)(\\d*)",
  ~ c(..1, if (nchar(..2)) ..2 else 1),
  simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
FALSE))
DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:


DF

 V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Spencer Graves

  Have you considered the 'CHNOSZ' package?


> makeup("C5H11BrO" )
   count
C  5
H 11
Br 1
O  1


  I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


  The print method for "cf" opened the results in a web browser, 
which showed that the "CHNOSZ" package had 14 of these 11 matches, and 
the other 7 were in 7 different packages.  Moreover, the "CHNOSZ" 
package is devoted to "Chemical Thermodynamics and Activity Diagrams" 
and provides many more capabilities that might interest you.



  Hope this helps.
  Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW!  Four great ideas, each worthy of 
study and I'll learn several things from each.  Interestingly, these 
solutions seem more general and more compact than the solutions I 
found on the 'net using python and perl.  More evidence for the power 
of R!  A big thanks to each of you!  Bryan


On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:


On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in
various languages to this question, but I have a pretty simple need 
(and I'm
not much good at regex).  I want to use a chemical formula as a 
function
argument.  The formula would be in "Hill order" which is to list C, 
then H,
then all other elements in alphabetical order.  My example will have 
only a
limited number of elements, few enough that one can search directly 
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO 
(note that
for oxygen and bromine, O or Br, there is no following number 
meaning a 1 is

implied).

Let's say


form <- "C5H11BrO"


I'd like to get the count of each element, so in this case I need to 
extract

C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
weight by mulitplying).  Sounds pretty simple, but my experiments 
with grep
and strsplit don't immediately clue me into an obvious solution.  As 
I said,

I don't need a general solution to the problem of calculating molecular
weight from an arbitrary formula, that seems quite challenging, just 
a way
to convert "form" into a list or data frame which I can then do the 
math on.


Here's hoping this is a simple issue for more experienced R users!  
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
  "([A-Z][a-z]*)(\\d*)",
  ~ c(..1, if (nchar(..2)) ..2 else 1),
  simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = 
FALSE))

DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:


DF

 V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson
Well let me just say thanks and WOW!  Four great ideas, each worthy of  
study and I'll learn several things from each.  Interestingly, these  
solutions seem more general and more compact than the solutions I  
found on the 'net using python and perl.  More evidence for the power  
of R!  A big thanks to each of you!  Bryan


On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson   
wrote:

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in
various languages to this question, but I have a pretty simple need  
(and I'm
not much good at regex).  I want to use a chemical formula as a  
function
argument.  The formula would be in "Hill order" which is to list C,  
then H,
then all other elements in alphabetical order.  My example will  
have only a
limited number of elements, few enough that one can search directly  
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO  
(note that
for oxygen and bromine, O or Br, there is no following number  
meaning a 1 is

implied).

Let's say


form <- "C5H11BrO"


I'd like to get the count of each element, so in this case I need  
to extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular
weight by mulitplying).  Sounds pretty simple, but my experiments  
with grep
and strsplit don't immediately clue me into an obvious solution.   
As I said,
I don't need a general solution to the problem of calculating  
molecular
weight from an arbitrary formula, that seems quite challenging,  
just a way
to convert "form" into a list or data frame which I can then do the  
math on.


Here's hoping this is a simple issue for more experienced R users!   
TIA,


This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
  "([A-Z][a-z]*)(\\d*)",
  ~ c(..1, if (nchar(..2)) ..2 else 1),
  simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =  
FALSE))

DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:


DF

 V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 6:29 PM, Bryan Hanson wrote:


Hello R Folks...

I've been looking around the 'net and I see many complex solutions  
in various languages to this question, but I have a pretty simple  
need (and I'm not much good at regex).  I want to use a chemical  
formula as a function argument.  The formula would be in "Hill  
order" which is to list C, then H, then all other elements in  
alphabetical order.  My example will have only a limited number of  
elements, few enough that one can search directly for each element.   
So some examples would be C5H12, or C5H12O or C5H11BrO (note that  
for oxygen and bromine, O or Br, there is no following number  
meaning a 1 is implied).


Let's say

> form <- "C5H11BrO"


Well here's how I see it:

The "form" can be split with a regular expression:
Capital letter followed by zero or one lower, followeed by a various  
number of digits


greg <- gregexpr("[A-Z]{1}[a-z]?[0-9]*", form)

Append a number equal to one moe lan the ength for reasins that will  
become clear


ugreg <- c(unlist(greg), nchar(form)+1)

Then use substring function to serially pick from a split point to one  
minus the next split point (or in that case of the last element one  
minus the length of the string:


> sapply(1:(length(ugreg)-1), function(z) substr(form, ugreg[z],  
ugreg[z+1]-1) )

[1] "C5"  "H11" "Br"  "O"

Then you can split these "triples" (cap,lower,n) and if n is absent  
assume 1.


> sub("(\\d*)$", "", sapply(1:(length(ugreg)-1),   # blank out the  
digits

function(z) substr(form, ugreg[z], ugreg[z+1]-1) ) )
[1] "C"  "H"  "Br" "O"

sub("^$", "1", sub("([A-Za-z]*)", "",# subst "1" for empty strings
sapply(1:(length(ugreg)-1),
  function(z) substr(form, ugreg[z], ugreg[z 
+1]-1) ) ) )

[1] "5"  "11" "1"  "1"

If you limited the number of elements searched for, it might improve  
the error trapping, I suppose.


--
David.




I'd like to get the count of each element, so in this case I need to  
extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate  
the molecular weight by mulitplying).  Sounds pretty simple, but my  
experiments with grep and strsplit don't immediately clue me into an  
obvious solution.  As I said, I don't need a general solution to the  
problem of calculating molecular weight from an arbitrary formula,  
that seems quite challenging, just a way to convert "form" into a  
list or data frame which I can then do the math on.


Here's hoping this is a simple issue for more experienced R users!   
TIA,  Bryan

***
Bryan Hanson
Professor of Chemistry & Biochemistry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.data? without separator

2010-12-26 Thread jim holtman
I have a problem with 'read.data' also in that I don't see that as a
function in the 'base'; I assume you meant read.table.

Also you did not indicate is all the lines were the same length.  Here
is a solution to return a list is each character broken out
separately.

> x <- readLines(textConnection("# comment
+ 1?0001010101
+ 101010??1010"))
> closeAllConnections()
> # split lines 2-n into a list of separate characters
> result <- lapply(x[-1], function(.line) strsplit(.line, '')[[1]])
> result
[[1]]
 [1] "1" "?" "0" "0" "0" "1" "0" "1" "0" "1" "0" "1"

[[2]]
 [1] "1" "0" "1" "0" "1" "0" "?" "?" "1" "0" "1" "0"



On Sun, Dec 26, 2010 at 1:04 PM, Fror  wrote:
>
> Hello,
>
> I have a problem with read.data. For example I have a file
>
> # comment
> 1?0001010101
> 101010??1010
>
> with comment on first line and data layout without separator. How I could
> read data that each character\sign was in another column. It is trivial
> probably, but I have no idea for it.
>
> Thank's,
> Kacper
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/read-data-without-separator-tp3164358p3164358.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread Gabor Grothendieck
On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  wrote:
> Hello R Folks...
>
> I've been looking around the 'net and I see many complex solutions in
> various languages to this question, but I have a pretty simple need (and I'm
> not much good at regex).  I want to use a chemical formula as a function
> argument.  The formula would be in "Hill order" which is to list C, then H,
> then all other elements in alphabetical order.  My example will have only a
> limited number of elements, few enough that one can search directly for each
> element.  So some examples would be C5H12, or C5H12O or C5H11BrO (note that
> for oxygen and bromine, O or Br, there is no following number meaning a 1 is
> implied).
>
> Let's say
>
>> form <- "C5H11BrO"
>
> I'd like to get the count of each element, so in this case I need to extract
> C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
> weight by mulitplying).  Sounds pretty simple, but my experiments with grep
> and strsplit don't immediately clue me into an obvious solution.  As I said,
> I don't need a general solution to the problem of calculating molecular
> weight from an arbitrary formula, that seems quite challenging, just a way
> to convert "form" into a list or data frame which I can then do the math on.
>
> Here's hoping this is a simple issue for more experienced R users!  TIA,

This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
   "([A-Z][a-z]*)(\\d*)",
   ~ c(..1, if (nchar(..2)) ..2 else 1),
   simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE))
DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:

> DF
  V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread David A. Johnston

There might be something simpler, but this is what I came up with:

form = "C5H11BrO"
ups = c(gregexpr("[[:upper:]]", form)[[1]], nchar(form) + 1)
seperated = sapply(1:(length(ups)-1), function(x) substr(form, ups[x],
ups[x+1] - 1))
elements =  gsub("[[:digit:]]", "", seperated)
nums = gsub("[[:alpha:]]", "", seperated)
ans = data.frame(element = as.character(elements),
  num = as.numeric(ifelse(nums == "", 1, nums)), stringsAsFactors = FALSE)
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Parsing-a-Simple-Chemical-Formula-tp3164562p3164581.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a Simple Chemical Formula

2010-12-26 Thread jim holtman
try this:

> f.extract <- function(formula)
+ {
+ # pattern to match the initial chemical
+ # assumes chemical starts with an upper case and optional lower
case followed
+ # by zero or more digits.
+ first <- "^([[:upper:]][[:lower:]]?)([0-9]*).*"
+ # inverse of above to remove the initial chemical
+ last <- "^[[:upper:]][[:lower:]]?[0-9]*(.*)"
+ result <- list()
+ extract <- formula
+ # repeat as long as there is data
+ while ((start <- nchar(extract)) > 0){
+ chem <- sub(first, '\\1 \\2', extract)
+ extract <- sub(last, '\\1', extract)
+ # if the number of characters is the same, then there was an error
+ if (nchar(extract) == start){
+ warning("Invalid formula:", formula)
+ return(NULL)
+ }
+ # append to the list
+ result[[length(result) + 1L]] <- strsplit(chem, ' ')[[1]]
+ }
+ result
+ }
> f.extract("C5H11BrO")
[[1]]
[1] "C" "5"

[[2]]
[1] "H"  "11"

[[3]]
[1] "Br"

[[4]]
[1] "O"

> f.extract("H2O")
[[1]]
[1] "H" "2"

[[2]]
[1] "O"

> f.extract("CCC")
[[1]]
[1] "C"

[[2]]
[1] "C"

[[3]]
[1] "C"

> f.extract("Crr")  # bad
NULL
Warning message:
In f.extract("Crr") : Invalid formula:Crr
>
>
On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson  wrote:
> Hello R Folks...
>
> I've been looking around the 'net and I see many complex solutions in
> various languages to this question, but I have a pretty simple need (and I'm
> not much good at regex).  I want to use a chemical formula as a function
> argument.  The formula would be in "Hill order" which is to list C, then H,
> then all other elements in alphabetical order.  My example will have only a
> limited number of elements, few enough that one can search directly for each
> element.  So some examples would be C5H12, or C5H12O or C5H11BrO (note that
> for oxygen and bromine, O or Br, there is no following number meaning a 1 is
> implied).
>
> Let's say
>
>> form <- "C5H11BrO"
>
> I'd like to get the count of each element, so in this case I need to extract
> C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
> weight by mulitplying).  Sounds pretty simple, but my experiments with grep
> and strsplit don't immediately clue me into an obvious solution.  As I said,
> I don't need a general solution to the problem of calculating molecular
> weight from an arbitrary formula, that seems quite challenging, just a way
> to convert "form" into a list or data frame which I can then do the math on.
>
> Here's hoping this is a simple issue for more experienced R users!  TIA,
>  Bryan
> ***
> Bryan Hanson
> Professor of Chemistry & Biochemistry
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLS with corAR(1) correlation structure residual/standard error calculation

2010-12-26 Thread Katharina Ley
I am using the gls function to fit a two-stage least squares model with
first order autoregressive error terms. Since there is no automated
adjustment for the use of two-stage least squares in this package, I am
trying to manually replicate standard errors of the coefficient estimates in
order to adjust for a first stage OLS estimate of endogenous variables.
However, thus far I have been unable to replicate the residuals or standard
errors produced by this function. My understanding is outlined below, but
using this approach does not yield the reported results. Is anyone familiar
with the inner workings of this function and can either explain the
calculation of the standard errors or provide code that explains the inner
workings of this function.

Thanks!

Example of the model I am running:
model1<- gls(Y~ X1I + X2 + X3 + X4, data=Dat1, correlation = corAR1(),
method = "ML")

My understanding of model errors:
Y = b_0 + X1 b_1+ ...Xk b_k + Z
Z_t =phi Z_{t-1) + e_t

The residuals reported by GLS are the Z's, while the white noise terms are
the e's. I cannot replicate the reported residuals using this approach. I
also do not know how Z_0 should be calculated, i.e. what does the first step
of this recursive procedure look like?

>From the residuals, I also cannot replicate the reported standard errors. I
am using se(b_j) = sqrt(sigma^2/sum(x_i-x_mean)^2) where sigma =sqrt(SSR/df)

Any help on this or explanation of how GLS works would be much appreciated.

Any clarification would be much appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levelplot blocks size

2010-12-26 Thread jonathan

Thanks for your advice, but my data is not decimals, so I don't need to round
the values. Instead, what I need to really do is "group" the values into
larger "blocks".

My data looks sort of like this:

xy z 
00687 
0164 
0271 
0355 
0452 
0551 
0638 
0738 
0854 
0949 
. 
. 
. 
987   9881
999   9981
999   9991


But what I need to do is make it so that on the graph rather than having
tiny little dots for each point (as shown in the bigplot diagram), there are
bigger points, so say 0<=x<10, 0<=y<10 is one point in the lower left,
rather than having 100 points for each x,y value.

The same strategy should then be applied to the whole graph.

Any ideas how to achieve this? I'm sure this is quite a common thing to do
want to with heatmaps??

Thanks,

Jonathan

-- 
View this message in context: 
http://r.789695.n4.nabble.com/levelplot-blocks-size-tp3089972p3164564.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parsing a Simple Chemical Formula

2010-12-26 Thread Bryan Hanson

Hello R Folks...

I've been looking around the 'net and I see many complex solutions in  
various languages to this question, but I have a pretty simple need  
(and I'm not much good at regex).  I want to use a chemical formula as  
a function argument.  The formula would be in "Hill order" which is to  
list C, then H, then all other elements in alphabetical order.  My  
example will have only a limited number of elements, few enough that  
one can search directly for each element.  So some examples would be  
C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or  
Br, there is no following number meaning a 1 is implied).


Let's say

> form <- "C5H11BrO"

I'd like to get the count of each element, so in this case I need to  
extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular weight by mulitplying).  Sounds pretty simple, but my  
experiments with grep and strsplit don't immediately clue me into an  
obvious solution.  As I said, I don't need a general solution to the  
problem of calculating molecular weight from an arbitrary formula,  
that seems quite challenging, just a way to convert "form" into a list  
or data frame which I can then do the math on.


Here's hoping this is a simple issue for more experienced R users!   
TIA,  Bryan

***
Bryan Hanson
Professor of Chemistry & Biochemistry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] T2 hoteling

2010-12-26 Thread Jim Lemon

On 12/27/2010 12:43 AM, leyla khodakarim wrote:

Dear All

It is very kind of you to guide me.

When I want to run this line, I see this error

stat.obs<- apply(GS, 2, function(z) Hott2(t(DATA[which(z==1),]), cl))

Error in colSums(w * x) : 'x' must be an array of at least two dimensions

cl<- as.factor(y)

GS: a matrix with 0 or 1

GS: gene sets

->  a data matrix with rows=genes,

columns= gene sets,

GS[i,j]=1 if gene i in gene set j

GS[i,j]=0 otherwise

Hott2<- function(x, y, var.equal=TRUE) #T2 hoteling

Y<- c(1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,1,0,1,0,1)

Data=transpose(X)= gene expression: row=40 gene, column=10 sample

Data: there is in attachment file


Hi Leyla,
Your attachment didn't make it to the list, but the problem may be that 
which(z==1) reduces the matrix (array? data frame?) X to a vector. One 
other thing that looks funny is the capitalization. In R, X and x are 
different, as are DATA and Data. First thing is to just print out the 
data you are trying to analyze:


DATA[which(z==1)]

and see if it really is an array with at least two dimensions.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.data? without separator

2010-12-26 Thread Fror

Hello,

I have a problem with read.data. For example I have a file

# comment
1?0001010101
101010??1010

with comment on first line and data layout without separator. How I could
read data that each character\sign was in another column. It is trivial
probably, but I have no idea for it.

Thank's,
Kacper
-- 
View this message in context: 
http://r.789695.n4.nabble.com/read-data-without-separator-tp3164358p3164358.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about mars() -function

2010-12-26 Thread Tiina Hakanen

Hi!

I have some questions about MARS model's coefficient of determination.  
I use the MARS method in my master's thesis and I have noticed some  
problems with

the MARS model's R^2.

You can see the following example that the MARS model's R^2 is too big  
when i have used mars() -function for MARS model building, and when I  
have made MARS-model using a linear regression, it gives much smaller  
R^2.


So can you please tell me some information about why the MARS model  
R^2 is so big? How can I get the MARS model´s correct R^2 in  
R-projector some another way than in the following example or by  
calculating it myself using R^2-formula?


I hope you can reply soon.

Best regards,

Tiina Hakanen


library(ElemStatLearn)
library(mda)
data<-ozone
m<-mars(data[,-1], data[,1], nk=4)
m$factor[m$s,]
m$cuts[m$s,]
m$coef
marsmodel<-lm(data[,1]~m$x-1)
summary(marsmodel)

Call:
lm(formula = data[, 1] ~ m$x - 1)

Residuals:
Min  1Q  Median  3Q Max
-36.264 -15.993  -2.351   9.993 122.793

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
m$x1  52.9783 3.8894  13.621  < 2e-16 ***
m$x2   4.7383 0.9599   4.936 2.92e-06 ***
m$x3  -1.9428 0.3084  -6.300 6.61e-09 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 23.38 on 108 degrees of freedom
Multiple R-squared: 0.8147, Adjusted R-squared: 0.8095
F-statistic: 158.2 on 3 and 108 DF,  p-value: < 2.2e-16

knot1 <- function (x,k) ifelse(x > k, x-k, 0)
knot2 <- function(x, k) ifelse(x < k, k-x, 0)
reg <- lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data)

summary(reg)

Call:
lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature,
85), data = data)

Residuals:
Min  1Q  Median  3Q Max
-36.264 -15.993  -2.351   9.993 122.793

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.9783 3.8894  13.621  < 2e-16 ***
knot1(temperature, 85)   4.7383 0.9599   4.936 2.92e-06 ***
knot2(temperature, 85)  -1.9428 0.3084  -6.300 6.61e-09 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 23.38 on 108 degrees of freedom
Multiple R-squared: 0.5153, Adjusted R-squared: 0.5064
F-statistic: 57.42 on 2 and 108 DF,  p-value: < 2.2e-16

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question on Statistics

2010-12-26 Thread Bert Gunter
Maithula:

On Sun, Dec 26, 2010 at 11:09 AM, Maithula Chandrashekhar
 wrote:
> I am not a pure Statistics background and therefore please forgive me if
> this question (which is not R related either) is too trivial.
>
> In many Statistics literature I find following statement: "restrictions in
> different coefficients matrices have to be imposed to ensure uniqueness of
> the parametrization". Can somebody tell me what is the meaning of Uniqueness
> in the parametrization? Does it mean that, two different coefficient
> matrices may give exactly the same result, and therefore coefficient matrix
> is not unique?
-- yes.

See the section on "contrast matrices" in Venables and Ripley's
"Modern Applied Statistics with S" (MASS) for a concise but, I think,
illuminating explanation. (It's in the chapter on linear
models/regression).

-- Bert

>
> I find there are many members (perhaps all) in this forum who are really
> masters in Statistics. Therefore I hope somebody will clarify me with the
> intuition behind that.
>
> Thanks,
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can't install R with *local* gcc

2010-12-26 Thread peter dalgaard

On Dec 26, 2010, at 17:50 , Oliver Kullmann wrote:

> Hello,
> 
> we re-distribute R with our open-source platform 
> http://www.ok-sat-library.org/
> where we use R mainly for evaluation of computational experiments.
> Due to the various platforms, we build everything from source, and that works 
> fine.
> Until now, that is: there are circumstances (for example in computer-science 
> computer labs)
> where no Fortran-compiler is provided, and the users (students) can't change 
> that.
> Thus we now try to build gfortran as part of the GCC version 4.2.4 suite, and 
> building
> R using that local gcc.
> We already use the local C and C++ compiler of the suite extensively, and that
> all works. But we don't have any experience with using gfortran.
> The gcc-build works fine, everything seems alright --- only R (version 
> 2.11.0) won't build with it:
> 
> We use the configuration
> 
> F77=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
>  
> FC=${F77} 
> CC=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gcc
>  
> CXX=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/g++
>  
> LDFLAGS="-L 
> /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib"
>  
> ./configure 
> --prefix=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/R/2.11.0
> 
> (the same problems with "lib64" instead of "lib", by the way)
> 
> which yields
> 
> checking for Fortran 77 libraries of 
> /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran...
>   
> -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib
>  
> -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4
>  
> -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../lib64
>  -L/lib/../lib64 -L/usr/lib/../lib64 
> -L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../..
>  -lgfortranbegin -lgfortran -lm 
> /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/libgfortran.a
> 
> which looks alright to me (but I don't know Fortran), but then we get
> 
> checking for dummy main to link with Fortran 77 libraries... none
> checking for Fortran 77 name-mangling scheme... lower case, underscore, no 
> extra underscore
> checking whether 
> /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
>  appends underscores to external names... yes
> checking whether 
> /home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
>  appends extra underscores to external names... no
> checking whether mixed C/Fortran code can be run... configure: WARNING: 
> cannot run mixed C/Fortran code
> configure: error: Maybe check LDFLAGS for paths to Fortran libraries?
> make: *** [R_base] Error 1
> 
> The R installation-documentation doesn't say much on using local compilers 
> (more or less nothing), and everything we could
> get from it are the above settings of environment variables.
> 
> Internet search reveals old stuff on "libg2c" which appears not to exist 
> anymore, some recommendations
> not to build from sources (which is not an option for us), an open Sage 
> ticket (apparently without any
> further work on it), and a request to the R-list with apparently no reply.
> 
> Since we are working in a well-defined setting (gcc is fully under our 
> control), and apparently
> all the libraries needed are build by gcc (though this is nowhere said or 
> (dream) specified),
> it should be possible to solve that problem.
> 
> I very hope to get some hints (we can't get R running (for our system!) 
> otherwise).
> The error is exactly the same on various systems (all 64-bit machines, Intel 
> and AMD).
> If we use the system-gcc (4.5.0 or 4.1.2) then the installation of R works 
> without problems;
> here (for one of the machines) some data

I suppose r-devel would be a better mailing list for this sort of thing, but 
since we're here:

Hint #1: Expect the process to be somewhat painful...
Hint #2: Study the configure script and config.log to the level where you can 
reproduce the  mixed C/Fortran code that it is trying to build and run and with 
which commands it is trying to build it
Hint #3: Figure out what it really should have done to build such code

An alternative hint is first to try setting up a very simple Fortran function 
to, say, double a number, and a C main program that calls it. Then try figuring 
out the compiler/linker options to make it work. (That is of course what 
configure was trying to do in the first place, but doing it by hand might be 
less prone to getting multiple toolchains mixed up.)


> 
>> version
> 

Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread RICHARD M. HEIBERGER
Dror,

Please look at the
demo(MMC.apple)
in the HH package

install.packages("HH") ## if you don't already have it.
library(HH)
demo(MMC.apple)

Please reply to the list if there are further queries.

Rich

On Sun, Dec 26, 2010 at 7:42 AM, Dror D Lev  wrote:

> Dear r helpers,
>
> I would like to look at the interaction between two two-level factors, one
> between and one within participants, after accounting for any variance due
> to practice (31 trials in each of two blocks) in the task.
> It seems to require treating practice as a covariate.
>
> All the examples I noticed for handling covariates (i.e. ANCOVA, including
> the ones in Faraway's "Practical regression and anova using r") use lm(),
> but this doesn't handle repeated-measures.
>
> I thought of a solution in the form of first running a regression on the
> covariate:
> > cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)
>
> and then run the aov() on the residuals:
> > m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +
> Error(subj/withinVar, data=dat)
>
> Does it seem to be a valid answer to my problem?
>
> Is there an existing function that can do this (perhaps more
> appropriately)?
>
> Thank you for any help,
> dror
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A question on Statistics

2010-12-26 Thread Maithula Chandrashekhar
I am not a pure Statistics background and therefore please forgive me if
this question (which is not R related either) is too trivial.

In many Statistics literature I find following statement: "restrictions in
different coefficients matrices have to be imposed to ensure uniqueness of
the parametrization". Can somebody tell me what is the meaning of Uniqueness
in the parametrization? Does it mean that, two different coefficient
matrices may give exactly the same result, and therefore coefficient matrix
is not unique?

I find there are many members (perhaps all) in this forum who are really
masters in Statistics. Therefore I hope somebody will clarify me with the
intuition behind that.

Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lost in POSIX

2010-12-26 Thread David Winsemius


On Dec 25, 2010, at 2:25 PM, Dimitri Shvorob wrote:



df = structure(list(t = structure(c(1033963406.044, 1033974144.847,
+ 1033988418.836), class = c("POSIXt", "POSIXct"))), .Names = "t",  
row.names

= c(NA,
+ 3L), class = "data.frame")
df$min = trunc(df$t,units="mins")

does not work,


??? seems to "work" on my system. Perhaps you should say what you mean  
by "not work"


> df
t min
1 2002-10-07 00:03:26 2002-10-07 00:03:00
2 2002-10-07 03:02:24 2002-10-07 03:02:00
3 2002-10-07 07:00:18 2002-10-07 07:00:00
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid  splines   stats graphics  grDevices utils  
datasets  methods   base


other attached packages:
 [1] nlme_3.1-97lme4_0.999375-37   Matrix_0.999375-46  
zoo_1.6-4  ggplot2_0.8.8  proto_0.3-8 
reshape_0.8.3  plyr_1.2.1 MASS_7.3-9
[10] rms_3.1-0  Hmisc_3.8-3survival_2.36-2 
sos_1.3-0  brew_1.0-4 lattice_0.19-13


loaded via a namespace (and not attached):
[1] cluster_1.13.2 stats4_2.12.1  tools_2.12.1



Jeff; you will see that my original post suggests familiarity
with 'trunc' :)


--
View this message in context: 
http://r.789695.n4.nabble.com/Lost-in-POSIX-tp3052768p3163914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lost in POSIX

2010-12-26 Thread Jeff Newmiller

Dimitri Shvorob wrote:

.. One issue with the solution proposed by Jeff is that the transformed
column does not have the original's type:


x = structure(list(time = structure(c(1020232904.818, 1020232904.818
), class = c("POSIXt", "POSIXct"), tzone = ""), price = c(321, 
323.5), minute = c(1020232860, 1020232860)), .Names = c("time", 
"price", "minute"), row.names = 1:2, class = "data.frame")


minute <- function(t)
{ 
  d <- as.POSIXlt(t, origin = as.Date("1970-01-01")) 
  d$sec <- 0 
  as.POSIXct(d) 
} 

x$minute = sapply(x$time, minute)  



head(x)

 time price minute
1 2002-05-01 07:01:44 321.0 1020232860
2 2002-05-01 07:01:44 323.5 1020232860


class(x.l$minute)

[1] "numeric"



That is not an issue with the "minute" function, as you can see if you
evaluate

> minute(x$time)
[1] "2002-04-30 23:01:00 PDT" "2002-04-30 23:01:00 PDT"

or

> str(minute(x$time))
 POSIXct[1:2], format: "2002-04-30 23:01:00" "2002-04-30 23:01:00"

rather, you are seeing a side effect of "sapply".

--
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lost in POSIX

2010-12-26 Thread Jeff Newmiller

Dimitri Shvorob wrote:
 df = structure(list(t = structure(c(1033963406.044, 1033974144.847, 
+ 1033988418.836), class = c("POSIXt", "POSIXct"))), .Names = "t", row.names
= c(NA, 
+ 3L), class = "data.frame") 
df$min = trunc(df$t,units="mins") 


does not work, Jeff; you will see that my original post suggests familiarity
with 'trunc' :) 


Well, perhaps you should read the error message or the "Value" section of 
?trunc.POSIXt, and convert the result to a compact type...


> df$min <- trunc( df$t, units="mins" )
Error in `$<-.data.frame`(`*tmp*`, "min", value = list(sec = 0, min = c(3L,  :
  replacement has 9 rows, data has 3
> df$min <- as.POSIXct( trunc( df$t, units="mins" ) )
> str(df)
'data.frame':   3 obs. of  2 variables:
 $ t  : POSIXct, format: "2002-10-06 21:03:26" "2002-10-07 00:02:24" ...
 $ min: POSIXct, format: "2002-10-06 21:03:00" "2002-10-07 00:02:00" ...


--
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package "arules" - 'transpose' of the transactions

2010-12-26 Thread Michael Hahsler

Hi Kohleth,


Suppose this is my list of transactions:


set.seed(200)

tran=random.transactions(100,3)

inspect(tran)

  itemstransactionID
1 {item80}trans1
2 {item8,
   item20}trans2
3 {item28}trans3


I want to get the 'transpose' of the data, i.e.

  transactionID  items
1 {trans2}item8
2 {trans2}item20
3 {trans3}item28
4 {trans1}item80



This is not the transpose. The data structure you want can be created 
this way:


> l <- LIST(tran)
> single <- data.frame(ID=rep(names(l), lapply(l, length)), 
items=unlist(l), row.names=NULL)

> single
  ID  items
1 trans1 item80
2 trans2  item8
3 trans2 item20
4 trans3 item28



I tried converting tran into a matrix, then transpose it, then convert it
back to transactions. But my dataset is actually very very large, so I
wonder if there is any faster method?


The method above should be very fast.

-Michael



Thanks





--
  Dr. Michael Hahsler, Visiting Assistant Professor
  Department of Computer Science and Engineering
  Lyle School of Engineering
  Southern Methodist University, Dallas, Texas

  (214) 768-8878 * mhahs...@lyle.smu.edu * http://lyle.smu.edu/~mhahsler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] can't install R with *local* gcc

2010-12-26 Thread Oliver Kullmann
Hello,

we re-distribute R with our open-source platform http://www.ok-sat-library.org/
where we use R mainly for evaluation of computational experiments.
Due to the various platforms, we build everything from source, and that works 
fine.
Until now, that is: there are circumstances (for example in computer-science 
computer labs)
where no Fortran-compiler is provided, and the users (students) can't change 
that.
Thus we now try to build gfortran as part of the GCC version 4.2.4 suite, and 
building
R using that local gcc.
We already use the local C and C++ compiler of the suite extensively, and that
all works. But we don't have any experience with using gfortran.
The gcc-build works fine, everything seems alright --- only R (version 2.11.0) 
won't build with it:

We use the configuration

F77=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
 
FC=${F77} 
CC=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gcc
 
CXX=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/g++
 
LDFLAGS="-L 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib"
 
./configure 
--prefix=/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/R/2.11.0

(the same problems with "lib64" instead of "lib", by the way)

which yields

checking for Fortran 77 libraries of 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran...
  
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib
 
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4
 
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../lib64
 -L/lib/../lib64 -L/usr/lib/../lib64 
-L/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../..
 -lgfortranbegin -lgfortran -lm 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/lib/libgfortran.a

which looks alright to me (but I don't know Fortran), but then we get

checking for dummy main to link with Fortran 77 libraries... none
checking for Fortran 77 name-mangling scheme... lower case, underscore, no 
extra underscore
checking whether 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
 appends underscores to external names... yes
checking whether 
/home/csoliver/SAT-Algorithmen/OKplatform/ExternalSources/Installations/Gcc/4.2.4/bin/gfortran
 appends extra underscores to external names... no
checking whether mixed C/Fortran code can be run... configure: WARNING: cannot 
run mixed C/Fortran code
configure: error: Maybe check LDFLAGS for paths to Fortran libraries?
make: *** [R_base] Error 1

The R installation-documentation doesn't say much on using local compilers 
(more or less nothing), and everything we could
get from it are the above settings of environment variables.

Internet search reveals old stuff on "libg2c" which appears not to exist 
anymore, some recommendations
not to build from sources (which is not an option for us), an open Sage ticket 
(apparently without any
further work on it), and a request to the R-list with apparently no reply.

Since we are working in a well-defined setting (gcc is fully under our 
control), and apparently
all the libraries needed are build by gcc (though this is nowhere said or 
(dream) specified),
it should be possible to solve that problem.

I very hope to get some hints (we can't get R running (for our system!) 
otherwise).
The error is exactly the same on various systems (all 64-bit machines, Intel 
and AMD).
If we use the system-gcc (4.5.0 or 4.1.2) then the installation of R works 
without problems;
here (for one of the machines) some data

> version
platform   x86_64-unknown-linux-gnu 
arch   x86_64   
os linux-gnu
system x86_64, linux-gnu
status  
major  2
minor  11.0 
year   2010 
month  04   
day22   
svn rev51801
language   R
version.string R version 2.11.0 (2010-04-22)

Thanks for you help in any case!

Oliver

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Marius Hofert
Dear David,

thank you for your answer.
As I wrote, I am looking for an option to control the *space* between the tick 
marks and the corresponding labels. I am happy with the *number* of tick marks 
and their default values. As far as I know, pscales can't control the space, so 
it is *not* what I am looking for.

Cheers,

Marius

On 2010-12-26, at 14:36 , David Winsemius wrote:

> 
> On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:
> 
>> Dear expeRts,
>> 
>> how can I decrease the space between the tick marks and the corresponding 
>> labels in an splom?
>> See here:
>> 
>> library(lattice)
>> U <- matrix(runif(4000), ncol = 8)
>> splom(U, axis.text.cex = 0.2) # => space between the [small] tick labels and 
>> tick marks is/seems to be too large
> 
> So you want more tick marks?
> 
>> 
>> I checked ?panel.pairs but could not find an option for that.
> 
> What about the pscales argument?
> 
> A single number would increase the number of ticks, or a list with "at" and 
> "labels" values can be passed. Seem to be just what you asked for.
> 
> --
> 
> David Winsemius, MD
> West Hartford, CT
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 9:55 AM, Dror D Lev wrote:

Thank you David, for the reference to Dalgaard's paper in  
Rnews_2007-2.


Unfortunately I don't seem to have the mathematical-statistical  
sophistication required to adapt the example in Dalgaard's paper for  
my case.


I hope someone can suggest a less-mathematical direction for solution.


Here's what I would suggest if you want to stay more concrete. If you  
are not prepared to offer a minimal subset of your own data and also  
provide working or non-working code that uses it, then pick an  
available dataset that resembles it in structure and autocorrelation.  
One possibility would be the BodyWeight dataset in either the nlme or  
the MEMSS packages (although see below for my current level of  
uncertainty regarding your data).


require(nlme)
plot(BodyWeight)



Thanks again,
dror



On Sun, Dec 26, 2010 at 3:59 PM, David Winsemius > wrote:


On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote:

Dear r helpers,

I would like to look at the interaction between two two-level  
factors, one
between and one within participants, after accounting for any  
variance due

to practice (31 trials in each of two blocks) in the task.
It seems to require treating practice as a covariate.


I had trouble figuring out exactly what you meant by 31 trials in two  
blocks. Was that 31 trials by each participant? Or was it two trials  
by each of 31 participants divided unequally into two groups?


--
David.



All the examples I noticed for handling covariates (i.e. ANCOVA,  
including
the ones in Faraway's "Practical regression and anova using r") use  
lm(),

but this doesn't handle repeated-measures.

See if Dalgaard's piece in R-News offers better guidance:

http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf




I thought of a solution in the form of first running a regression on  
the

covariate:
cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)

and then run the aov() on the residuals:
m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +
Error(subj/withinVar, data=dat)

Does it seem to be a valid answer to my problem?

Is there an existing function that can do this (perhaps more  
appropriately)?


Thank you for any help,
dror
--

David Winsemius, MD
West Hartford, CT




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread Dror D Lev
Thank you David, for the reference to Dalgaard's paper in Rnews_2007-2.

Unfortunately I don't seem to have the mathematical-statistical
sophistication required to adapt the example in Dalgaard's paper for my
case.

I hope someone can suggest a less-mathematical direction for solution.

Thanks again,
dror



On Sun, Dec 26, 2010 at 3:59 PM, David Winsemius wrote:

>
> On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote:
>
>  Dear r helpers,
>>
>> I would like to look at the interaction between two two-level factors, one
>> between and one within participants, after accounting for any variance due
>> to practice (31 trials in each of two blocks) in the task.
>> It seems to require treating practice as a covariate.
>>
>> All the examples I noticed for handling covariates (i.e. ANCOVA, including
>> the ones in Faraway's "Practical regression and anova using r") use lm(),
>> but this doesn't handle repeated-measures.
>>
>
> See if Dalgaard's piece in R-News offers better guidance:
>
> http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf
>
>
>
>
>> I thought of a solution in the form of first running a regression on the
>> covariate:
>>
>>> cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)
>>>
>>
>> and then run the aov() on the residuals:
>>
>>> m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +
>>>
>> Error(subj/withinVar, data=dat)
>>
>> Does it seem to be a valid answer to my problem?
>>
>> Is there an existing function that can do this (perhaps more
>> appropriately)?
>>
>> Thank you for any help,
>> dror
>>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Performing basic Multiple Sequence Alignment in R?

2010-12-26 Thread Mike Marchywka



> From: marchy...@hotmail.com
> To: tal.gal...@gmail.com; r-help@r-project.org
> Subject: RE: [R] Performing basic Multiple Sequence Alignment in R?
> Date: Tue, 21 Dec 2010 17:03:17 -0500

> > From: tal.gal...@gmail.com
> > Date: Tue, 21 Dec 2010 20:17:18 +0200
> > Subject: Re: [R] Performing basic Multiple Sequence Alignment in R?
> > To: r-help@r-project.org
> >
> >
> > Dear Mike and Thomas,
> >
> > From what I gathered here (Thanks to Joris Meys):
> > http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434
> > There is an R interface to the MUSCLE algorithm in the bio3d package
> > (function seqaln()).
> > But not one for clustal.
> >
> > I will probably end up using pairwiseAlignment on pairs of allignments
> > with some sort of stopping rules (I'll have to play with it to see how
> > it works).
>
>
> http://scholar.google.com/scholar?hl=en&q=%22exact+string+matching%22+alignment
>
> http://citeseerx.ist.psu.edu/search?q=exact+string+matching+alignment+dna&submit=Search&sort=rel
>
> Certainly if you are flexible and can use whatever may be close in R that
> is fine but I seem to recall that exact string matching was a fast and
> interesting way to go and maybe some of the authors above, in the interest
> of promoting their work, would help implement an R version if there is demand.
>
> I seem to recall I did something like building indexes of the strings to be 
> aligned
> first, finding substrings that were unique to a given string but appeared only
> once in each of the sequences to be aligned ( this was the most restrictive 
> criterion
> but you can imagine how to make it more accomodating). Now that you got me 
> started,
> up front tokenizing or compiling of input sequences ( usually no more than 
> indexing
> them in some way ) made many later operations like alignment go faster. This
> may have ended up being similar to BLAST but now I can't really recall. 
> Anyway,
> my point here is that some where in R there may be packages that
> generate intermediate forms useful across disciplines- mining data from
> text, linquistics, or macromolecule analysis.  In fact, the indexing process
> helps find things that have migrated a long ways from their original place
> and there are probably other non-alignment related things you could
> get out of the approach.
>


If you pursue this or make some decision would you please get back to
us, at least me off list? I just went back through my old code and hit the 
search links I posted above, this still seems like quite an interesting
area and the issues do not appear to be confined to bio. Looking at
my method names in my code, it looks like I had a way to supply fixed patterns,
probably from places like PROSITE or CDD, for use as the string you
probably meant to suggest although I seem to think it would make more sense
to discover these based on the strings it finds in the sequences.

I seem to recall I could do 2 sequences reasonably well with some quirks and 
limitations
but gave up when I tried to do multiple alignments ( actually there was no point
at the time). Recent literature seems to still talk about sub-quadratic time 
although practically for large sequences the real execution time could be 
dominated
by VM not algorithm order LOL. The indexing also makes it possible to find 
related
but distant strings, something that may be of interest but not normally
thought of as alignment between strings perturbed in limited ways ( "edit 
distance"
being rather restricted to a few operations). 

If you find a specific paper or approach that seems to work that may be
of interest to many here and indeed may be implemented under some other name. 

Thanks.









>
>
>
>
>
> >
>
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the best way to lag a time series?

2010-12-26 Thread Bert Gunter
The correct answer to "How to lag..?" is almost certainly, "Don't."
The functionality of  numerous time series packages and functions take
care of this automatically for you (using suitable data structures,
probably). Rather than trying to reinvent wheels, it might be wiser to
consult the Time Series Task View on Cran to see what's there first.

Incidentally, my limited understanding is that modern time series
methods tend to use more appropriately specified covariance structures
(e.g. arima models) rather than the lagged models  of e.g. classical
econometrics. But on this, I would happily stand correction.

-- Cheers,
 Bert

On Sun, Dec 26, 2010 at 12:21 AM, Liviu Andronic  wrote:
> On Sun, Dec 26, 2010 at 8:49 AM, Christian Schoder
>  wrote:
>> Dear R-users,
>>
>> I've been using R for a while and I am very satisfied! Unfortunately, I
>> still have not figured out an efficient and general way to construct and
>> use lags of time series, especially when I need to work with different
>> packages.
>>
>> Let me give an example. I have two time series x and y and I want to
>> estimate a variaty of distributed lags models and run different tests
>> (autocorrelation, etc). It is obvious that I need to be able to lag x
>> and y in a flexible way. So far, my temporary solution was to construct
>> the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it
>> to R, which is not very satisfactory because it does not allow for much
>> flexibility.
>>
>> Is there a straighforward command which allows me to easily construct a
>> lag
>>
> Perhaps ?diff.
>
> Liviu
>
>
>> when required and which allows me to, for example, use the lm()
>> command to fit a dynamic model and the bgtest() command to perform the
>> breusch-godfrey test on the same model?
>>
>> Is it adviseable to use time series objects which consist of many time
>> series (like a dataframe) or is it better to have it contain only one
>> time series?
>>
>> I would be grateful for any hints and links.
>>
>> Thx!
>> Christian
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Do you know how to read?
> http://www.alienetworks.com/srtest.cfm
> http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
> Do you know how to write?
> http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculation of BIC done by leaps-package

2010-12-26 Thread Jan Henckens

Hi Folks,

I've got a question concerning the calculation of the Schwarz-Criterion 
(BIC) done by summary.regsubsets() of the leaps-package:


Using regsubsets() to perform subset-selection I receive an regsubsets 
object that can be summarized by summary.regsubsets(). After this 
operation the resulting summary contains a vector of BIC-values 
representing models of size i=1,...,K.


My problem is that I can't reproduce the calculation of these BIC 
values. I already tried to use extractAIC(...,k=log(n)), 
AIC(...,k=log(n)) and manual calculation using the RSS-vector but none 
matches the calculation done by the summary-function. I already checked 
for constants that could be the reason for the differences but i found 
out, that the values vary apart of adding a constant term.



The source code of the leaps-package states the package calculates the 
BIC this way:


bicvec<-c(bicvec,(n1+ll$intercept)*log(vr)+i*log(n1+ll$intercept))

with:

## number of observations - Intercept:
n1<-ll$nn-ll$intercept
## fraction of sum of squared residulas model i
## and sum of squared residuals null model, I
## just can't understand why the vector ll$ress
## is subscripted double
vr<-ll$ress[i,j]/ll$nullrss
## maximum number of variables
i

^^ This seems to match the calculation done by extractAIC but it doesn't!

Maybe anyone can tell me about the reason of the variation of the 
BIC-values?


Best regards,
Jan Henckens



### Minimal Example:
require(leaps)
bridge <- 
read.table("http://www.stat.tamu.edu/~sheather/book/docs/datasets/bridge.txt";, 
header=TRUE)

fmla.full <- formula(Time ~ .)
(lm.model <- summary(regsubsets(fmla.full,data=bridge,weights=NULL, 
intercept=TRUE, method="forward")))

lm.model$bic
### The first two models constructed via lm():
extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge)))
extractAIC(lm(Time~Dwgs+Case,data=bridge),k=log(nrow(bridge)))

or see

http://www.henckens.de/min_example.R



--
jan.henckens | jöllenbecker str. 58 | 33613 bielefeld | germany
tel 0521-5251970

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Berend Hasselman


bbslover wrote:
> 
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
> 
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>for (j in 1:num.compd)
>   if (i!=j) {
> S1<-sum(x[i,]*x[j,])
> S2<-sum(x[i,]^2)
> S3<-sum(x[j,]^2)
> sim2<-S1/(S2+S3-S1)
> diss2<-1-sim2
> diss.all<-diss.all+diss2}
> 
> it will cost a long time to finish this computation! i really need "rapid"
> code to replace my code.
> 

Alternative 1:  j-loop only needs to start at i+1 so

for( i in 1:num.compd) {
for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
S1<-sum(x[i,]*x[j,])
S2<-sum(x[i,]^2)
S3<-sum(x[j,]^2)
sim2<-S1/(S2+S3-S1)
diss2<-1-sim2
diss2.all<-diss2.all+diss2
}
}
diss2.all <- 2 * diss2.all

On my pc this is about twice as fast as your version (with 202 samples and
263 variables)

Alternative 2: all sum() are not necessary. Use some matrix algebra:

xtx <- x %*% t(x)
diss3.all <- 0
for( i in 1:num.compd) {
for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
S1 <- xtx[i,j]
S2 <- xtx[i,i]
S3 <- xtx[j,j]
sim2<-S1/(S2+S3-S1)
diss2<-1-sim2
diss3.all<-diss3.all+diss2
}
}
diss3.all <- 2 * diss3.all

This is about four times as fast as alternative 1.

I'm quite sure that more expert R gurus can get some more speed up.

Note: I generated the x matrix with:
set.seed(1);x<-matrix(runif(202*263),nrow=202)
(Timings on iMac 2.16Ghz and using 64-bit R)

Berend

-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164262.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 7:42 AM, Dror D Lev wrote:


Dear r helpers,

I would like to look at the interaction between two two-level  
factors, one
between and one within participants, after accounting for any  
variance due

to practice (31 trials in each of two blocks) in the task.
It seems to require treating practice as a covariate.

All the examples I noticed for handling covariates (i.e. ANCOVA,  
including
the ones in Faraway's "Practical regression and anova using r") use  
lm(),

but this doesn't handle repeated-measures.


See if Dalgaard's piece in R-News offers better guidance:

http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf




I thought of a solution in the form of first running a regression on  
the

covariate:

cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)


and then run the aov() on the residuals:

m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +

Error(subj/withinVar, data=dat)

Does it seem to be a valid answer to my problem?

Is there an existing function that can do this (perhaps more  
appropriately)?


Thank you for any help,
dror

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] T2 hoteling

2010-12-26 Thread leyla khodakarim
Dear All

It is very kind of you to guide me.

When I want to run this line, I see this error

stat.obs <- apply(GS, 2, function(z) Hott2(t(DATA[which(z==1),]), cl))

Error in colSums(w * x) : 'x' must be an array of at least two dimensions

cl <- as.factor(y)

GS: a matrix with 0 or 1

GS: gene sets

-> a data matrix with rows=genes,

columns= gene sets,

GS[i,j]=1 if gene i in gene set j

GS[i,j]=0 otherwise

Hott2 <- function(x, y, var.equal=TRUE) #T2 hoteling

Y<- c(1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,1,0,1,0,1)

Data=transpose(X)= gene expression: row=40 gene, column=10 sample

Data: there is in attachment file

Thanks a lot


-
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 5:41 AM, Marius Hofert wrote:


Dear expeRts,

how can I decrease the space between the tick marks and the  
corresponding labels in an splom?

See here:

library(lattice)
U <- matrix(runif(4000), ncol = 8)
splom(U, axis.text.cex = 0.2) # => space between the [small] tick  
labels and tick marks is/seems to be too large


So you want more tick marks?



I checked ?panel.pairs but could not find an option for that.


What about the pscales argument?

A single number would increase the number of ticks, or a list with  
"at" and "labels" values can be passed. Seem to be just what you asked  
for.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] object names from character strings

2010-12-26 Thread jim holtman
Consider storing the dataframes in a list so that you do not have to
create unique names and it will also give you better control by
keeping all the data together in one object.

On Sun, Dec 26, 2010 at 4:04 AM, Jim Bouldin  wrote:
> I realize this is probably pretty basic but I can't figure it out.
>
> I'm looping through an array, doing various calculations and producing a
> resulting data frame in each loop iteration.  I need to give each data frame
> a different name.  Although I can easily create a new character string for
> writing each frame to an output file, I cannot figure out how to convert
> such strings to corresponding object names within the R workspace itself, so
> as to give each d.f. a distinct name.  The closest I got were various
> attempts with the as.name function, but couldn't get that to work either.
>  Any help appreciated.  Thanks.
>
> --
> Jim Bouldin, PhD
> Research Ecologist
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] object names from character strings

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 4:04 AM, Jim Bouldin wrote:


I realize this is probably pretty basic but I can't figure it out.

I'm looping through an array, doing various calculations and  
producing a resulting data frame in each loop iteration.  I need to  
give each data frame a different name.  Although I can easily create  
a new character string for writing each frame to an output file, I  
cannot figure out how to convert such strings to corresponding  
object names within the R workspace itself, so as to give each d.f.  
a distinct name.  The closest I got were various attempts with the  
as.name function, but couldn't get that to work either.  Any help  
appreciated.  Thanks.


Here's the first example in the help(assign) page:

or(i in 1:6) { #-- Create objects 'r.1', 'r.2', ... 'r.6'
   nam <- paste("r",i, sep=".")
  assign(nam, 1:i) }
ls(pattern = "^r..$")




--
Jim Bouldin, PhD
Research Ecologist

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R2WinBugs data import error

2010-12-26 Thread David Winsemius


On Dec 26, 2010, at 12:44 AM, unsown wrote:



For some purpose, I  need to transfer a NAs array to WinBugs through
R2WinBugs, But I constantly got an error message:"'type' must be  
"real" for

this format". Here is my data to transfer:

x = matrix(data=NA,nrow=3,ncol=3)


str(x)
It is of mode "logical".

Try instead:
x = matrix(vector(mode="numeric",0) ,nrow=3,ncol=3)



x =  as.array(x)
data <- list ("x")


Why are you making a list with a single character element? If you need  
to pass the matricx you just created in a list then try (and don't use  
"data" as the name :


dat <- list(x)





if I add a line to above setting, then I can pass R2WinBugs:

x[1,1] = 0

If I manually input the NA array to WinBugs, I could get it running.  
So my

original data set has no problem with WinBugs.
--



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

Dear all,

My double for loop as follows, but it is little efficient, I hope all
friends can give me a "vectorized" program to replace my code. thanks

x: is a matrix  202*263,  that is 202 samples, and 263 independent variables

num.compd<-nrow(x); # number of compounds
diss.all<-0
for( i in 1:num.compd)
   for (j in 1:num.compd)
  if (i!=j) {
S1<-sum(x[i,]*x[j,])
S2<-sum(x[i,]^2)
S3<-sum(x[j,]^2)
sim2<-S1/(S2+S3-S1)
diss2<-1-sim2
diss.all<-diss.all+diss2}

it will cost a long time to finish this computation! i really need "rapid"
code to replace my code.

thanks

kevin


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Doing a mixed-ANOVA after accounting for a covariate

2010-12-26 Thread Dror D Lev
Dear r helpers,

I would like to look at the interaction between two two-level factors, one
between and one within participants, after accounting for any variance due
to practice (31 trials in each of two blocks) in the task.
It seems to require treating practice as a covariate.

All the examples I noticed for handling covariates (i.e. ANCOVA, including
the ones in Faraway's "Practical regression and anova using r") use lm(),
but this doesn't handle repeated-measures.

I thought of a solution in the form of first running a regression on the
covariate:
> cov.accnt = lm (myMeasure ~ myCovMeasure, data=dat)

and then run the aov() on the residuals:
> m.aov = aov (cov.accnt$residuals ~ withinVar*betweenVar +
Error(subj/withinVar, data=dat)

Does it seem to be a valid answer to my problem?

Is there an existing function that can do this (perhaps more appropriately)?

Thank you for any help,
dror

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lattice splom: how to adjust space between tick marks and tick labels?

2010-12-26 Thread Marius Hofert
Dear expeRts,

how can I decrease the space between the tick marks and the corresponding 
labels in an splom?
See here:

library(lattice)
U <- matrix(runif(4000), ncol = 8)
splom(U, axis.text.cex = 0.2) # => space between the [small] tick labels and 
tick marks is/seems to be too large

I checked ?panel.pairs but could not find an option for that.

Cheers,

Marius
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fitting mixtures with non-linear parameters constraints

2010-12-26 Thread Jonathan Rosenblatt
Dear R users

Does anyone happen to know a function to fit a Gaussian mixture using
*non-linear* constraints between the parameters? (An EM the allows
that will do the job obviously).

Thank you in advance

--
Jonathan Rosenblatt
www.john-ros.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R2WinBugs data import error

2010-12-26 Thread unsown

For some purpose, I  need to transfer a NAs array to WinBugs through
R2WinBugs, But I constantly got an error message:"'type' must be "real" for
this format". Here is my data to transfer:

x = matrix(data=NA,nrow=3,ncol=3)
x =  as.array(x)
data <- list ("x")

if I add a line to above setting, then I can pass R2WinBugs:

x[1,1] = 0

If I manually input the NA array to WinBugs, I could get it running. So my
original data set has no problem with WinBugs.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R2WinBugs-data-import-error-tp3164106p3164106.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to specify ff object filepaths when reading a CSV file into a ff data frame.

2010-12-26 Thread Xiaobo Gu
Hi, I have done another simple test, I test the two syntext against a
CSV file with only one column, both success,

> fdf <- read.csv.ffdf(file="D:/rtemp/fftest2.csv",asffdf_args = list( col_args 
> =  list(filename=c("F:/a.f"
> fdf
ffdf (all open) dim=c(2,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
 PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
PhysicalIsOpen
col1 col1  integer   integer FALSE   FALSE
   FALSE 11   1
   TRUE
ffdf data
  col1
11
22


> fdf <- read.csv.ffdf(file="D:/rtemp/fftest2.csv",asffdf_args = list( col_args 
> =  c(list(filename="D:/a2.f"
> fdf
ffdf (all open) dim=c(2,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
 PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
PhysicalIsOpen
col1 col1  integer   integer FALSE   FALSE
   FALSE 11   1
   TRUE
ffdf data
  col1
11
22
>

Regards,

Xiaobo Gu



On Fri, Dec 24, 2010 at 11:27 PM, Xiaobo Gu  wrote:
> Hi,
>    The read.csv.ffdf function in package ff will create the ff object
> physical file in the default directories, I am trying to let the files
> created in the paths users specify, I think the point is to make use
> of the asffdf_args parameter,
> I have a test CSV file named D:\rtemp\fftest.csv, the content of the
> file is as following:
>
> col1,col2,col3
> 1,"amber",2.4
> 2,"linda",4.5
>
> I tried the following code, hoping ff will create the physical files
> for col1,col2 and col3 to D:/a.f,D:/b.f,D:/c.f respectively
>
>  fdf <- read.csv.ffdf(file="D:/rtemp/fftest.csv",asffdf_args = list(
> col_args =  c(list(filename="D:/a.f"), list(filename="D:/b.f"),
> list(filename="D:/c.f"
> and the error message is :
> Error in as.ff.default(1:2, vmode = NULL, filename = "D:/a.f",
> filename = "D:/b.f",  :
>  formal argument "filename" matched by multiple actual arguments
>
> I also tried the following:
>
>> fdf <- read.csv.ffdf(file="D:/rtemp/fftest.csv",asffdf_args = list( col_args 
>> =  list(filename=c("D:/a.f","D:/b.f","D:/c.f"
> Error in ff(initdata = initdata, length = length, levels = levels,
> ordered = ordered,  :
>  bad argument initdata for existing file; initializing existing file is 
> invalid
> In addition: Warning messages:
> 1: In if (file.exists(filename)) { :
>  the condition has length > 1 and only the first element will be used
> 2: In if (file.exists(filename)) { :
>  the condition has length > 1 and only the first element will be used
> 3: In if (file.access(filename, 4) == -1) { :
>  the condition has length > 1 and only the first element will be used
> 4: In if (file.access(filename, 2) == -1) { :
>  the condition has length > 1 and only the first element will be used
> 5: In if (is.na(filesize)) stop("unable to open file") :
>  the condition has length > 1 and only the first element will be used
>
> My questions are:
> 1. What's the datatype of the col_args parameter of the as.ffdf function
> 2. If I can make layout of the asffdf_args parameter correct, how can
> I set the exact filenames for each column of the ff data frame.
>
> Regards,
>
> Xiaobo Gu
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the best way to lag a time series?

2010-12-26 Thread Patrick Burns

First off, there are data manipulation
techniques that will beat doing it in
a spreadsheet.  For example:

head(x, -1)

is lagged 1 relative to

tail(x, -1)

But I think you are really looking for
'Lag' in the 'quantmod' package.

On 26/12/2010 07:49, Christian Schoder wrote:

Dear R-users,

I've been using R for a while and I am very satisfied! Unfortunately, I
still have not figured out an efficient and general way to construct and
use lags of time series, especially when I need to work with different
packages.

Let me give an example. I have two time series x and y and I want to
estimate a variaty of distributed lags models and run different tests
(autocorrelation, etc). It is obvious that I need to be able to lag x
and y in a flexible way. So far, my temporary solution was to construct
the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it
to R, which is not very satisfactory because it does not allow for much
flexibility.

Is there a straighforward command which allows me to easily construct a
lag when required and which allows me to, for example, use the lm()
command to fit a dynamic model and the bgtest() command to perform the
breusch-godfrey test on the same model?

Is it adviseable to use time series objects which consist of many time
series (like a dataframe) or is it better to have it contain only one
time series?

I would be grateful for any hints and links.

Thx!
Christian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] object names from character strings

2010-12-26 Thread Jim Bouldin

I realize this is probably pretty basic but I can't figure it out.

I'm looping through an array, doing various calculations and producing a 
resulting data frame in each loop iteration.  I need to give each data 
frame a different name.  Although I can easily create a new character 
string for writing each frame to an output file, I cannot figure out how 
to convert such strings to corresponding object names within the R 
workspace itself, so as to give each d.f. a distinct name.  The closest 
I got were various attempts with the as.name function, but couldn't get 
that to work either.  Any help appreciated.  Thanks.


--
Jim Bouldin, PhD
Research Ecologist

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the best way to lag a time series?

2010-12-26 Thread Liviu Andronic
On Sun, Dec 26, 2010 at 8:49 AM, Christian Schoder
 wrote:
> Dear R-users,
>
> I've been using R for a while and I am very satisfied! Unfortunately, I
> still have not figured out an efficient and general way to construct and
> use lags of time series, especially when I need to work with different
> packages.
>
> Let me give an example. I have two time series x and y and I want to
> estimate a variaty of distributed lags models and run different tests
> (autocorrelation, etc). It is obvious that I need to be able to lag x
> and y in a flexible way. So far, my temporary solution was to construct
> the lags manually (x1,..,xn and y1,..,yn) in a spreadsheet and import it
> to R, which is not very satisfactory because it does not allow for much
> flexibility.
>
> Is there a straighforward command which allows me to easily construct a
> lag
>
Perhaps ?diff.

Liviu


> when required and which allows me to, for example, use the lm()
> command to fit a dynamic model and the bgtest() command to perform the
> breusch-godfrey test on the same model?
>
> Is it adviseable to use time series objects which consist of many time
> series (like a dataframe) or is it better to have it contain only one
> time series?
>
> I would be grateful for any hints and links.
>
> Thx!
> Christian
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.