HI, Assuming that "out_dat.txt" is the output you expected.
dat1<- read.table("data1.txt",header=TRUE,stringsAsFactors=FALSE) dat2<- read.table("data2.txt",header=TRUE,stringsAsFactors=FALSE) out_dat<- read.table("out_data.txt",header=TRUE,stringsAsFactors=FALSE) out_dat2<-merge(dat1[,1:4],dat2,by="ID") identical(out_dat,out_dat2) #[1] TRUE A.K. ________________________________ From: Vivek Das <vd4mm...@gmail.com> To: arun <smartpink...@yahoo.com> Cc: R help <r-help@r-project.org> Sent: Tuesday, May 7, 2013 6:07 PM Subject: Re: R help for creating expression data of Differentially expressed genes HI Arun, My data sets are as in the provided files. I am providing the sample files. I guess this will give a better idea to the type of working I want to do with the two files and the kind or script am trying to write. Hope you can give me some suggestions regarding this. I am new to R so having trouble to use different functions to use this for my working. Anyone who can help me out with this can be of great help. ---------------------------------------------------------- Vivek Das PhD Student in Computational Biology Giuseppe Testa's Lab European School of Molecular Medicine IFOM-IEO Campus Via Adamello, 16 Milan, Italy emails: vivek....@ieo.eu vchris...@yahoo.co.in vd4mm...@gmail.com On Tue, May 7, 2013 at 10:36 PM, arun <smartpink...@yahoo.com> wrote: Hi Vivek, > >May be this helps: >set.seed(35) > dat1<- cbind(ID=1:8, >as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7))) > >set.seed(38) >dat2<- cbind(ID= sample(1:20,8,replace=FALSE), >as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33))) >colnames(dat2)[-1]<-gsub("V","X",colnames(dat2)[-1]) > merge(dat1[,1:2],dat2[,1:31],by="ID") ># ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 >#1 1 43 44 4 33 47 29 43 31 15 2 34 42 5 18 22 36 34 44 3 45 9 >#2 3 28 4 18 45 24 5 20 30 16 49 34 33 5 24 49 31 10 45 21 26 20 >#3 6 5 16 1 5 2 26 6 40 16 15 50 26 37 22 25 39 16 24 29 50 42 >#4 7 25 26 39 16 29 5 40 15 27 46 16 38 36 42 8 3 29 7 13 18 38 >#5 8 30 3 41 25 38 24 41 44 23 2 45 33 10 18 20 49 19 23 42 25 5 ># X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 >#1 14 27 3 21 6 44 33 42 10 29 >#2 48 13 8 47 18 9 23 9 44 3 >#3 25 14 31 19 14 6 26 13 6 49 >#4 43 28 15 6 9 19 43 21 41 21 >#5 1 27 18 3 42 5 16 39 46 47 > >A.K. > > > >----- Original Message ----- > >From: Vivek Das <vd4mm...@gmail.com> >To: arun <smartpink...@yahoo.com> >Cc: > >Sent: Tuesday, May 7, 2013 3:45 PM >Subject: R help for creating expression data of Differentially expressed genes > >Hi Arun, > >I need some help regarding R scripting. I have two data file one containing >seven columns and the other containing 33. Both files have unique identifier >as ID. I want to create another file which should have the first two columns >of the first file and and the 31 columns of the second file matched on the >basis of ID. The first file is having gene I'd and gene names of around 500 >and I want the output file which is having all of those and other attributes >as well. I want to get the output file having all attributes matching with the >I'd of the first file. So that I get output of 500 rows with all the >attributes of second file. I am new to R but having trouble with merge >function in R. If you can help it will be great. > >Regards, >Vivek > >Sent from my iPad > >On 07/mag/2013, at 21:13, arun <smartpink...@yahoo.com> wrote: > >> HI Ye, >> >> For the NA in ID column, >> >> >> >> Hi >> dat1<- read.table(text=" >> ObsNumber ID Weight >> 1 0001 12 >> 2 0001 13 >> 3 0001 14 >> 4 0002 16 >> 5 0002 17 >> 6 N/A 18 >> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A") >> unlist(lapply(split(dat1,dat1$ID),function(x) >>with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE) >> #[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2" >> A.K. >> ________________________________ >> From: Ye Lin <ye...@lbl.gov> >> To: arun <smartpink...@yahoo.com> >> Cc: R help <r-help@r-project.org> >> Sent: Tuesday, May 7, 2013 2:54 PM >> Subject: Re: [R] create unique ID for each group >> >> >> >> Thanks A.K. But I have "NA" in ID column, so when I apply the code, it gives >> me error saying the replacement as less rows than the data has. Anyway for >> ID=N/A, return sth like "N/A_1" in order as well? >> >> >> >> >> >> >> On Tue, May 7, 2013 at 11:17 AM, arun <smartpink...@yahoo.com> wrote: >> >> H, >>> Sorry, a mistake: >>> dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x) >>> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE) >>> dat1 >>> # ObsNumber ID Weight UniqueID >>> #1 1 0001 12 0001_1 >>> #2 2 0001 13 0001_2 >>> #3 3 0001 14 0001_3 >>> #4 4 0002 16 0002_1 >>> #5 5 0002 17 0002_2 >>> >>> dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) >>> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE) >>> >>> A.K. >>> >>> >>> >>> >>> >>> ----- Original Message ----- >>> >>> From: arun <smartpink...@yahoo.com> >>> To: Ye Lin <ye...@lbl.gov> >>> Cc: R help <r-help@r-project.org> >>> Sent: Tuesday, May 7, 2013 2:10 PM >>> Subject: Re: [R] create unique ID for each group >>> >>> >>> >>> Hi, >>> >>> Try this: >>> dat1<- read.table(text=" >>> ObsNumber ID Weight >>> 1 0001 12 >>> 2 0001 13 >>> 3 0001 14 >>> 4 0002 16 >>> 5 0002 17 >>> ",sep="",header=TRUE,colClass=c("numeric","character","numeric")) >>> dat2<- read.table(text=" >>> ID Height >>> 0001 3.2 >>> 0001 2.6 >>> 0001 3.2 >>> 0002 2.2 >>> 0002 2.6 >>> ",sep="",header=TRUE,colClass=c("character","numeric")) >>> dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_"))) >>> >>>dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_"))) >>> dat2 >>> # ID Height UniqueID >>> #1 0001 3.2 0001_1 >>> #2 0001 2.6 0001_2 >>> #3 0001 3.2 0001_3 >>> #4 0002 2.2 0002_4 >>> #5 0002 2.6 0002_5 >>> A.K. >>> >>> >>> >>> ----- Original Message ----- >>> From: Ye Lin <ye...@lbl.gov> >>> To: R help <r-help@r-project.org> >>> Cc: >>> Sent: Tuesday, May 7, 2013 1:54 PM >>> Subject: [R] create unique ID for each group >>> >>> Hey All, >>> >>> I have a dataset(dat1) like this: >>> >>> ObsNumber ID Weight >>> 1 0001 12 >>> 2 0001 13 >>> 3 0001 14 >>> 4 0002 16 >>> 5 0002 17 >>> >>> And another dataset(dat2) like this: >>> >>> ID Height >>> 0001 3.2 >>> 0001 2.6 >>> 0001 3.2 >>> 0002 2.2 >>> 0002 2.6 >>> >>> I want to merge dat1 and dat2 based on "ID" in order, I know "match" only >>> returns the first match it finds. So I am thinking create unique ID col in >>> dat2 and dat2, then merge. But I dont know how to do that so it can be like >>> this: >>> >>> dat1: >>> >>> ObsNumber ID Weight UniqueID >>> 1 0001 12 0001_1 >>> 2 0001 13 0001_2 >>> 3 0001 14 0001_3 >>> 4 0002 16 0002_1 >>> 5 0002 17 0002_1 >>> >>> dat2: >>> >>> ID Height UniqueID >>> 0001 3.2 0001_1 >>> 0001 2.6 0001_2 >>> 0001 3.2 0001_3 >>> 0002 2.2 0002_1 >>> 0002 2.6 0002_2 >>> >>> Or if it is possible to merge dat1 and dat2 by matching "ID" but return the >>> match in order that would be great! >>> >>> Thanks for your help! >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.