OK... next question.. Which is still a data manipulation problem so I believe the heading is still OK.
##So now I read my population data from excel. pop<-read.csv("pop.csv") typeof(pop) ## yields a list where I have age-specific population rows and a yearly column population, where the years are suffixed by X c<-(1953:2008) names(pop)<-c c.div<-cut(c,break=seq(1950,2010,by=5) Now I'd like to sum the agespecific population over the individual levels of -c.div- and generate a new table for this with agespecific rows and columns containing the 5-year bins instead of the original yearly data. Do I have to program this from scratch or is it possible to use an already existing function? //M qta<- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE On Mon, Apr 5, 2010 at 10:11 PM, moleps <mole...@gmail.com> wrote: > > Thx Erik, > I have no idea what went wrong with the other code snippet, but this one > works.. Appreciate it. > > qta<- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = > TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE)) > > M > > > On 5. apr. 2010, at 21.45, Erik Iverson wrote: > >> I don't know what your data are like, since you haven't given a reproducible >> example. I was imagining something like: >> >> ## generate fake data >> age <- sample(20:90, 100, replace = TRUE) >> year <- sample(1950:2000, 100, replace = TRUE) >> >> ##look at big table >> table(age, year) >> >> ## categorize data >> ## see include.lowest and right arguments to cut >> age.factor <- cut(age, breaks = seq(20, 90, by = 10), >> include.lowest = TRUE) >> >> year.factor <- cut(year, breaks = seq(1950, 2000, by = 10), >> include.lowest = TRUE) >> >> table(age.factor, year.factor) >> >> moleps wrote: >>> I already did try the regression modeling approach. However the >>> epidemiologists (referee) turns out to be quite fond of comparing the >>> incidence rates to different standard populations, hence the need for this >>> labourius approach. And trying the "cutting" approach I ended up with : >>>> table (age5) >>> age5 >>> (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] (35,40] >>> (40,45] (45,50] (50,55] (55,60] (60,65] (65,70] (70,75] (75,80] >>> (80,85] (85,100] 35 34 33 47 51 109 >>> 157 231 362 511 745 926 1002 866 547 >>> 247 82 18 >>>> table (yr5) >>> yr5 >>> (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] >>> (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] >>> 3 5 5 5 5 5 >>> 5 5 5 5 5 3 >>>> table (yr5,age5) >>> Error in table(yr5, age5) : all arguments must have the same length >>> Sincerely, >>> M >>> On 5. apr. 2010, at 20.59, Bert Gunter wrote: >>>> You have tempted, and being weak, I yield to temptation: >>>> >>>> "Any good ideas?" >>>> >>>> Yes. Don't do this. >>>> >>>> (what you probably really want to do is fit a model with age as a factor, >>>> which can be done statistically e.g. by logistic regression; or graphically >>>> using conditioning plots, e.g. via trellis graphics (the lattice package). >>>> This avoids the arbitrariness and discontinuities of binning by age range.) >>>> >>>> Bert Gunter >>>> Genentech Nonclinical Biostatistics >>>> >>>> -----Original Message----- >>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >>>> Behalf Of moleps >>>> Sent: Monday, April 05, 2010 11:46 AM >>>> To: r-help@r-project.org >>>> Subject: [R] Data manipulation problem >>>> >>>> Dear R´ers. >>>> >>>> I´ve got a dataset with age and year of diagnosis. In order to >>>> age-standardize the incidence I need to transform the data into a matrix >>>> with age-groups (divided in 5 or 10 years) along one axis and year divided >>>> into 5 years along the other axis. Each cell should contain the number of >>>> cases for that age group and for that period. >>>> I.e. >>>> My data format now is >>>> ID-age (to one decimal)-year(yearly data). >>>> >>>> What I´d like is >>>> >>>> age 1960-1965 1966-1970 etc... >>>> 0-5 3 8 10 15 >>>> 6-10 2 5 8 13 >>>> etc.. >>>> >>>> >>>> Any good ideas? >>>> >>>> Regards, >>>> M >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.