Have you considered the 'CHNOSZ' package?

> makeup("C5H11BrO" )
   count
C      5
H     11
Br     1
O      1


      I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf


The print method for "cf" opened the results in a web browser, which showed that the "CHNOSZ" package had 14 of these 11 matches, and the other 7 were in 7 different packages. Moreover, the "CHNOSZ" package is devoted to "Chemical Thermodynamics and Activity Diagrams" and provides many more capabilities that might interest you.


      Hope this helps.
      Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW! Four great ideas, each worthy of study and I'll learn several things from each. Interestingly, these solutions seem more general and more compact than the solutions I found on the 'net using python and perl. More evidence for the power of R! A big thanks to each of you! Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:

On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson <han...@depauw.edu> wrote:
Hello R Folks...

I've been looking around the 'net and I see many complex solutions in
various languages to this question, but I have a pretty simple need (and I'm not much good at regex). I want to use a chemical formula as a function argument. The formula would be in "Hill order" which is to list C, then H, then all other elements in alphabetical order. My example will have only a limited number of elements, few enough that one can search directly for each element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is
implied).

Let's say

form <- "C5H11BrO"

I'd like to get the count of each element, so in this case I need to extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular
weight by mulitplying). Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. As I said,
I don't need a general solution to the problem of calculating molecular
weight from an arbitrary formula, that seems quite challenging, just a way to convert "form" into a list or data frame which I can then do the math on.

Here's hoping this is a simple issue for more experienced R users! TIA,

This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through a specified
function as successive arguments.

Thus the first arg is form, your input string.  The second arg is the
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionally followed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. the portions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
  "([A-Z][a-z]*)(\\d*)",
  ~ c(..1, if (nchar(..2)) ..2 else 1),
simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors = FALSE))
DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:

DF
 V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to