Re: [R] Data manipulation problem

moleps islon Tue, 06 Apr 2010 06:57:07 -0700

OK... next question.. Which is still a data manipulation problem so I
believe the heading is still OK.


##So now I read my population data from excel.
pop<-read.csv("pop.csv")

typeof(pop) ## yields a list where I have age-specific population rows
and a yearly column population, where the years are suffixed by X

c<-(1953:2008)
names(pop)<-c
c.div<-cut(c,break=seq(1950,2010,by=5)

Now I'd like to sum the agespecific population over the individual
levels of -c.div- and generate a new table for this with agespecific
rows and columns containing the 5-year bins instead of the original
yearly data. Do I have to program this from scratch or is it possible
to use an already existing function?


//M






qta<- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE

On Mon, Apr 5, 2010 at 10:11 PM, moleps <mole...@gmail.com> wrote:
>
> Thx Erik,
> I have no idea what went wrong with the other code snippet, but this one 
> works.. Appreciate it.
>
> qta<- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = 
> TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))
>
> M
>
>
> On 5. apr. 2010, at 21.45, Erik Iverson wrote:
>
>> I don't know what your data are like, since you haven't given a reproducible 
>> example. I was imagining something like:
>>
>> ## generate fake data
>> age <- sample(20:90, 100, replace = TRUE)
>> year <- sample(1950:2000, 100, replace = TRUE)
>>
>> ##look at big table
>> table(age, year)
>>
>> ## categorize data
>> ## see include.lowest and right arguments to cut
>> age.factor <- cut(age, breaks = seq(20, 90, by = 10),
>>                  include.lowest = TRUE)
>>
>> year.factor <- cut(year, breaks = seq(1950, 2000, by = 10),
>>                   include.lowest = TRUE)
>>
>> table(age.factor, year.factor)
>>
>> moleps wrote:
>>> I already did try the regression modeling approach. However the 
>>> epidemiologists (referee) turns out to be quite fond of comparing the 
>>> incidence rates to different standard populations, hence the need for this 
>>> labourius approach. And trying the "cutting" approach I ended up with :
>>>> table (age5)
>>> age5
>>>   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  
>>> (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  
>>> (80,85] (85,100]       35       34       33       47       51      109      
>>> 157      231      362      511      745      926     1002      866      547 
>>>      247       82       18
>>>> table (yr5)
>>> yr5
>>> (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] 
>>> (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009]     
>>>       3           5           5           5           5           5         
>>>   5           5           5           5           5           3
>>>> table (yr5,age5)
>>> Error in table(yr5, age5) : all arguments must have the same length
>>> Sincerely,
>>> M
>>> On 5. apr. 2010, at 20.59, Bert Gunter wrote:
>>>> You have tempted, and being weak, I yield to temptation:
>>>>
>>>> "Any good ideas?"
>>>>
>>>> Yes. Don't do this.
>>>>
>>>> (what you probably really want to do is fit a model with age as a factor,
>>>> which can be done statistically e.g. by logistic regression; or graphically
>>>> using conditioning plots, e.g. via trellis graphics (the lattice package).
>>>> This avoids the arbitrariness and discontinuities of binning by age range.)
>>>>
>>>> Bert Gunter
>>>> Genentech Nonclinical Biostatistics
>>>>
>>>> -----Original Message-----
>>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
>>>> Behalf Of moleps
>>>> Sent: Monday, April 05, 2010 11:46 AM
>>>> To: r-help@r-project.org
>>>> Subject: [R] Data manipulation problem
>>>>
>>>> Dear R´ers.
>>>>
>>>> I´ve got a dataset with age and year of diagnosis. In order to
>>>> age-standardize the incidence I need to transform the data into a matrix
>>>> with age-groups (divided in 5 or 10 years) along one axis and year divided
>>>> into 5 years along the other axis. Each cell should contain the number of
>>>> cases for that age group and for that period.
>>>> I.e.
>>>> My data format now is
>>>> ID-age (to one decimal)-year(yearly data).
>>>>
>>>> What I´d like is
>>>>
>>>> age 1960-1965 1966-1970 etc...
>>>> 0-5 3 8 10 15
>>>> 6-10 2 5 8 13
>>>> etc..
>>>>
>>>>
>>>> Any good ideas?
>>>>
>>>> Regards,
>>>> M
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data manipulation problem

Reply via email to