Try this (after making sure that Col_1 in data2 matches your column names in data1
> data1 <- read.table(textConnection("Taxon stage1 stage2 stage3 stage4 + T1 0 0 1 1 + T2 0 1 1 0 + T3 0 0 0 1 + T4 1 0 0 0"), header=TRUE) > data2 <- read.table(textConnection("Col_1 Col_2 + stage1 Group1 + stage2 Group1 + stage3 Group2 + stage4 Group2"), header=TRUE, as.is=TRUE) > closeAllConnections() > # get the columns to summarize by > colSumz <- split(data2$Col_1, data2$Col_2) > # create the output matrix > result <- matrix(0, nrow=nrow(data1), ncol=length(colSumz)) > colnames(result) <- names(colSumz) > rownames(result) <- data1$Taxon > for (i in names(colSumz)){ + result[, i] <- rowSums(data1[, colSumz[[i]]]) + } > result Group1 Group2 T1 0 2 T2 1 1 T3 0 1 T4 1 0 > On Mon, Sep 6, 2010 at 1:49 PM, Martin Hughes <sensei2...@hotmail.com> wrote: > > Hi > > This question is far less simple than the title suggests, please read > carefully, thanks. > > I have 2 sets of data, both read into R > >>data1<-read.table ("1.txt", header=T, sep="\t") >>data2<-read.table ("2.txt", header=T, sep="\t") > >>data1 > > Taxon stage1 stage2 stage3 stage4 > T1 0 0 1 1 > T2 0 1 1 0 > T3 0 0 0 1 > T4 1 0 0 0 > > >>data2 # this is a library file, it contains all possible values of stage >>(Col_1) that may be contained in the data1 file (headers of each column), and >>what they correspond to > # in the Col_2 ie stages 1:2 == Group1 > > Col_1 Col_2 > Stage1 Group1 > Stage2 Group1 > Stage3 Group2 > Stage4 Group2 > > I want to get R to combine the columns in data1 based on the information in > data2 (Col_2), eg in this instance reduce the columns in data1 from 4 to 2, > summing up the > values within each column of data1 to get the result below > > Taxon group1 group2 > > T1 0 1 > > T2 1 1 > > T3 0 1 > > T4 1 0 > > i have many datasets which have different numbers of stage eg one dataset > will have stage1-10, another will have stage15-35 (data2, Col_2 has all > possilbe stage values so will say what group they correspond to) > > so far i can isolate the rows of data2 which contains the stages in data1 > with this: > >> data1.names<-names(data1[,-1]) #take the header names >> from data1 minus the 1st column (this is not found in the data2 library file) >> row.numbers<-match(data1.names, data2[,1]) #match the vector containing >> the data1 column header names to those found in the library file of data2 >> data2.small<-data2[row.numbers] #reduce the data2 to >> only include the same stages as found in the data1 file > > from here on i dont know what to, really i wanted to just be able to change > the header names of data1 to their corresponding name that is found in Col_2 > and then use some statement that could merge columns in data1 which were the > same (and also sum the values at each row and dividing by their value if they > were greater than 1 (so i only have 0 or 1 again) but i dont know how to do > that. > > Can someone help me to get the desired result (as in the example above) that > doe not require me to manually merge columns? ie get the example output in an > automated way that could take any version of the data1 file (ie with > different stage values) and using the data2 file (library file - same in each > instance) get the output similar as in the example above? > > > Thanks > > Martin > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.