Hi, If you really wanted precision (significant figures) rather than decimal places, it would be easy: format() handles that, I believe.
Your original email said you'd been reading about regular expressions; continuing that reading will lead you to the meaning of the cryptic ^ and all the \. As for the final ., you're right: I didn't think about having nothing following the decimal place. It's much easier to do in two steps: > testdata <- data.frame(values=c("10,000.0", "5.321", "1.1"), digits=c(0, 1, > 2)) > intermediate <- apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], > "})(\\d*)", sep=""), "\\1", x[1])) > intermediate [1] "10,000." "5.3" "1.1" > sub("\\.$", "", intermediate) [1] "10,000" "5.3" "1.1" Sarah On Wed, Dec 7, 2011 at 8:20 AM, Aidan Corcoran <aidan.corcora...@gmail.com> wrote: > Hi Sarah, > > apologies for the excess. A smaller example: > > f<-structure(list(c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap" > ), `2005` = c(32128, 0.1), `2009` = c(52163, 0.1), `2010` = c(63100, > 0.1), `2011` = c(72461, 0.1), `2012` = c(81313, 0.1)), .Names = c("", > "2005", "2009", "2010", "2011", "2012"), row.names = 3:4, class = c("cast_df", > "data.frame")) > > nam2<- > structure(list(var1 = c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap" > ), digi = c(0, 1)), .Names = c("var1", "digi"), row.names = c("98", > "110"), class = "data.frame") > > I'm trying to place a thousand separator in the numbers in the table f: > >> f > 2005 2009 2010 2011 2012 > 3 GDP per capita (LCU) 32128.0 52163.0 63100.0 72461.0 81313.0 > 4 Ratio to EZ GDP Per Cap 0.1 0.1 0.1 0.1 0.1 > > and also have precision given by variable digi: > >> nam2 > var1 digi > 98 GDP per capita (LCU) 0 > 110 Ratio to EZ GDP Per Cap 1 > > format > hi<-format(f,big.mark=",",scientific=F) > gives me the comma, but now I'm not sure how to get the precision. > > Your answer seems to be doing what I want, although when I changed the > testdata slightly >>testdata[1,1]<-10000 >> hi<-format(testdata,big.mark=",",scientific=F) >> hi > values digits > 1 10,000.0 0 > 2 5.3 1 > 3 1.1 2 >> apply(hi, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", sep=""), >> "\\1", x[1])) > 1 2 3 > "10,000." " 5.3" " 1.1" > The decimal appears to be left behind in 10,000. > > Unfortunately your approach is a bit too advanced for me, so I can't > adapt it. Perhaps you could recommend somewhere where I could read up > on what the caret and other symbols mean in your paste call? > > thanks for your help! > > Aidan > > On Wed, Dec 7, 2011 at 12:05 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: >> Hi, >> >> Example data is crucial, but small simple example data is even better. >> I'm too lazy to figure out which bits I need from your data, so here's >> a simple example of one way to approach your question. You could >> use gsub() in very much the same manner if you need more complex >> output. >> >>> testdata <- data.frame(values=c(2.0, 5.3, 1.1), digits=c(0, 1, 2)) >>> testdata >> values digits >> 1 2.0 0 >> 2 5.3 1 >> 3 1.1 2 >> # a nice way that works on numbers >>> apply(testdata, 1, function(x)sprintf(paste("%0.", x[2], "f", sep=""), >>> x[1])) >> [1] "2" "5.3" "1.10" >> >> # a messy way that works on strings >>> apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", >>> sep=""), "\\1", x[1])) >> [1] "2" "5.3" "1.1" >> >> Also note that the second method will not add zeros to pad out the >> end. If you need that, I'd consider rearranging the order of your >> steps so that you can use sprintf(). >> >> Someone else might have a more flexible way too; I'd be interested to see it. >> Unfortunately I don't think sprintf() has a way to insert a thousands >> separator, >> or that would be a one-step solution. >> >> Sarah >> >> On Wed, Dec 7, 2011 at 6:05 AM, Aidan Corcoran >> <aidan.corcora...@gmail.com> wrote: >>> Dear all, >>> >>> I'm trying to remove some text after the period (a decimal point) in >>> the data frame 'hi', below. This is one step in formatting a table. So >>> I would like e.g. >>> "2.0" to become "2" >>> and "5.3" to be "5.3", >>> where the variable digordered contains the number of digits after the >>> decimal that I would like to display, in the same order in which the >>> variables appear in hi. If it makes it easier to use, this info is >>> also contained in the dataframe nam2. The reason the numbers are >>> recorded as characters is because I used format to get a thousand >>> separator, which I also need. >>> >>> The string manipulation functions in R generally don't seem to work >>> with matrices or data frames, so e.g. regexpr("\\.", hi[1,2]) works >>> but not regexpr("\\.", hi). Finding the location of the period and >>> then using substring was the approach I was thinking of taking, but >>> this would seem to need for loops here. I was wondering if anyone >>> knows any easier ways. >>> >>> Thanks very much for any help! >>> >>> Aidan >>> >>> >>> digordered<- c(0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1) >>> f<-structure(list(c("GDP (LCU,bn)", "GDP ($, bn)", "GDP per capita (LCU)", >>> "Ratio to EZ GDP Per Cap", "Share of World GDP (Intl $, %)", >>> "Real GDP Growth (%)", "Population (mn)", "Unemployment Rate (%)", >>> "Ratio of Employed/Unemployed", "PPP Exchange Rate", "Nominal Exchange >>> Rate (LCU per $)", >>> "Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on >>> Central Gov", >>> "Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA", >>> "Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST >>> Liabilities" >>> ), `2005` = c(35662, 809, 32128, 0.1, 4.3, 9, 1110, 3.5, NA, >>> 14.7, 44.1, 4, 10.8, 7, 15, 22835, NA, NA, NA, NA), `2009` = c(61240, >>> 1265, 52163, 0.1, 5.2, 6.8, 1174, NA, NA, 16.8, 48.4, 10.9, 12.2, >>> 14, 31, 47180, 13.6, 9, 10.8, 42.8), `2010` = c(75122, 1632, >>> 63100, 0.1, 5.5, 10.1, 1191, NA, NA, 18.5, 45.7, 12, NA, 15, >>> 39, 56787, 14.7, 9.9, 10.5, 41.1), `2011` = c(87455, 1843, 72461, >>> 0.1, 5.7, 7.8, 1207, NA, NA, 19.6, NA, 10.6, NA, NA, NA, NA, >>> 13.5, 9.3, 14.3, 35.8), `2012` = c(99459, 2013, 81313, 0.1, 5.9, >>> 7.5, 1223, NA, NA, 20.5, NA, 8.6, NA, NA, NA, NA, NA, NA, NA, >>> NA)), .Names = c("", "2005", "2009", "2010", "2011", "2012"), row.names = >>> c(NA, >>> 20L), class = c("cast_df", "data.frame")) >>> >>> hi<-format(f,big.mark=",",scientific=F) >>> regexpr("\\.", hi) #don't know to get location of "." in a dataframe of >>> chars >>> >>> >>> nam2<- structure(list(var1 = c("GDP (LCU,bn)", "GDP ($, bn)", "GDP >>> per capita (LCU)", >>> "Ratio to EZ GDP Per Cap", "GDP per capita (Intl $)", "EU GDP per >>> capita (Intl $)", >>> "Share of World GDP (Intl $, %)", "Real GDP Growth (%)", "Population (mn)", >>> "Unemployment Rate (%)", "Ratio of Employed/Unemployed", "Employment >>> (1000s)", >>> "Unemployment (1000s)", "PPP Exchange Rate", "Nominal Exchange Rate >>> (LCU per $)", >>> "Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on >>> Central Gov", >>> "Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA", >>> "Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST >>> Liabilities", >>> "Reserves"), digi = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, >>> 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0)), .Names = c("var1", "digi" >>> ), row.names = c("96", "97", "98", "110", "99", "100", "101", >>> "102", "103", "111", "112", "104", "105", "106", "107", "108", >>> "109", "114", "115", "113", "119", "120", "121", "122", "116" >>> ), class = "data.frame") >>> >>> ________________________ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.