Hi Arun, I am still facing trouble as I can see the data output is identical for all rows when I am using this merge function. It seems that since in my data2 which I have provided I have not given you the exact genes I have. There are likely to be repeatations of ID in both the files but the problem now is that when the loop is running for the merge on basis of ID it is printing on the rows for which the ID's are repeated more than once and which does not have any repeatations in the data.txt they are being ignored. Say I had a locus XLOC_002126 which is the ID and its repeated for 10 times in the data1.txt so when my merging is taking place its only working on those ID's which are repeated more than once in data1.txt and merging them with their respective attributes. And this is happening for the number of times it is being repeated in data1.txt and the next data on which it merges is also the same for which in data1.txt we have more repeats for ID. Here is an example of the output I am getting below.
ID test_id gene locus Sample_118p_0 Sample_118rp3_0 Sample_118rz_0 Sample_118z_0 Sample_132p1_0 Sample_132p2_0 Sample_132p3_0 Sample_132rp1_0 Sample_132rp3_0 Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0 Sample_132z_0 Sample_141p1_0 Sample_141p2_0 Sample_141p3_0 Sample_141p4_0 Sample_141z_0 Sample_183p1_0 Sample_183p2_0 Sample_183p3_0 Sample_183z_0 Sample_91p_0 Sample_91rp1_0 Sample_91rp3_0 Sample_91rp4_0 Sample_91rz_0 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 ---------------------------------------------------------- Vivek Das PhD Student in Computational Biology Giuseppe Testa's Lab European School of Molecular Medicine IFOM-IEO Campus Via Adamello, 16 Milan, Italy emails: vivek....@ieo.eu vchris...@yahoo.co.in vd4mm...@gmail.com On Wed, May 8, 2013 at 12:35 AM, arun <smartpink...@yahoo.com> wrote: > HI, > Assuming that "out_dat.txt" is the output you expected. > > > dat1<- read.table("data1.txt",header=TRUE,stringsAsFactors=FALSE) > dat2<- read.table("data2.txt",header=TRUE,stringsAsFactors=FALSE) > out_dat<- read.table("out_data.txt",header=TRUE,stringsAsFactors=FALSE) > out_dat2<-merge(dat1[,1:4],dat2,by="ID") > identical(out_dat,out_dat2) > #[1] TRUE > A.K. > > > > > > ________________________________ > From: Vivek Das <vd4mm...@gmail.com> > To: arun <smartpink...@yahoo.com> > Cc: R help <r-help@r-project.org> > Sent: Tuesday, May 7, 2013 6:07 PM > Subject: Re: R help for creating expression data of Differentially > expressed genes > > > > HI Arun, > > My data sets are as in the provided files. I am providing the sample > files. I guess this will give a better idea to the type of working I want > to do with the two files and the kind or script am trying to write. Hope > you can give me some suggestions regarding this. I am new to R so having > trouble to use different functions to use this for my working. > > Anyone who can help me out with this can be of great help. > > > > ---------------------------------------------------------- > > Vivek Das > PhD Student in Computational Biology > Giuseppe Testa's Lab > European School of Molecular Medicine > IFOM-IEO Campus > Via Adamello, 16 > Milan, Italy > > emails: vivek....@ieo.eu > vchris...@yahoo.co.in > vd4mm...@gmail.com > > > > On Tue, May 7, 2013 at 10:36 PM, arun <smartpink...@yahoo.com> wrote: > > Hi Vivek, > > > >May be this helps: > >set.seed(35) > > dat1<- cbind(ID=1:8, > as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7))) > > > >set.seed(38) > >dat2<- cbind(ID= sample(1:20,8,replace=FALSE), > as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33))) > >colnames(dat2)[-1]<-gsub("V","X",colnames(dat2)[-1]) > > merge(dat1[,1:2],dat2[,1:31],by="ID") > ># ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 > X19 X20 > >#1 1 43 44 4 33 47 29 43 31 15 2 34 42 5 18 22 36 34 44 3 > 45 9 > >#2 3 28 4 18 45 24 5 20 30 16 49 34 33 5 24 49 31 10 45 21 > 26 20 > >#3 6 5 16 1 5 2 26 6 40 16 15 50 26 37 22 25 39 16 24 29 > 50 42 > >#4 7 25 26 39 16 29 5 40 15 27 46 16 38 36 42 8 3 29 7 13 > 18 38 > >#5 8 30 3 41 25 38 24 41 44 23 2 45 33 10 18 20 49 19 23 42 > 25 5 > ># X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 > >#1 14 27 3 21 6 44 33 42 10 29 > >#2 48 13 8 47 18 9 23 9 44 3 > >#3 25 14 31 19 14 6 26 13 6 49 > >#4 43 28 15 6 9 19 43 21 41 21 > >#5 1 27 18 3 42 5 16 39 46 47 > > > >A.K. > > > > > > > >----- Original Message ----- > > > >From: Vivek Das <vd4mm...@gmail.com> > >To: arun <smartpink...@yahoo.com> > >Cc: > > > >Sent: Tuesday, May 7, 2013 3:45 PM > >Subject: R help for creating expression data of Differentially expressed > genes > > > >Hi Arun, > > > >I need some help regarding R scripting. I have two data file one > containing seven columns and the other containing 33. Both files have > unique identifier as ID. I want to create another file which should have > the first two columns of the first file and and the 31 columns of the > second file matched on the basis of ID. The first file is having gene I'd > and gene names of around 500 and I want the output file which is having all > of those and other attributes as well. I want to get the output file having > all attributes matching with the I'd of the first file. So that I get > output of 500 rows with all the attributes of second file. I am new to R > but having trouble with merge function in R. If you can help it will be > great. > > > >Regards, > >Vivek > > > >Sent from my iPad > > > >On 07/mag/2013, at 21:13, arun <smartpink...@yahoo.com> wrote: > > > >> HI Ye, > >> > >> For the NA in ID column, > >> > >> > >> > >> Hi > >> dat1<- read.table(text=" > >> ObsNumber ID Weight > >> 1 0001 12 > >> 2 0001 13 > >> 3 0001 14 > >> 4 0002 16 > >> 5 0002 17 > >> 6 N/A 18 > >> > ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A") > >> unlist(lapply(split(dat1,dat1$ID),function(x) > with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE) > >> #[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2" > >> A.K. > >> ________________________________ > >> From: Ye Lin <ye...@lbl.gov> > >> To: arun <smartpink...@yahoo.com> > >> Cc: R help <r-help@r-project.org> > >> Sent: Tuesday, May 7, 2013 2:54 PM > >> Subject: Re: [R] create unique ID for each group > >> > >> > >> > >> Thanks A.K. But I have "NA" in ID column, so when I apply the code, it > gives me error saying the replacement as less rows than the data has. > Anyway for ID=N/A, return sth like "N/A_1" in order as well? > >> > >> > >> > >> > >> > >> > >> On Tue, May 7, 2013 at 11:17 AM, arun <smartpink...@yahoo.com> wrote: > >> > >> H, > >>> Sorry, a mistake: > >>> dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x) > with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE) > >>> dat1 > >>> # ObsNumber ID Weight UniqueID > >>> #1 1 0001 12 0001_1 > >>> #2 2 0001 13 0001_2 > >>> #3 3 0001 14 0001_3 > >>> #4 4 0002 16 0002_1 > >>> #5 5 0002 17 0002_2 > >>> > >>> dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) > with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE) > >>> > >>> A.K. > >>> > >>> > >>> > >>> > >>> > >>> ----- Original Message ----- > >>> > >>> From: arun <smartpink...@yahoo.com> > >>> To: Ye Lin <ye...@lbl.gov> > >>> Cc: R help <r-help@r-project.org> > >>> Sent: Tuesday, May 7, 2013 2:10 PM > >>> Subject: Re: [R] create unique ID for each group > >>> > >>> > >>> > >>> Hi, > >>> > >>> Try this: > >>> dat1<- read.table(text=" > >>> ObsNumber ID Weight > >>> 1 0001 12 > >>> 2 0001 13 > >>> 3 0001 14 > >>> 4 0002 16 > >>> 5 0002 17 > >>> ",sep="",header=TRUE,colClass=c("numeric","character","numeric")) > >>> dat2<- read.table(text=" > >>> ID Height > >>> 0001 3.2 > >>> 0001 2.6 > >>> 0001 3.2 > >>> 0002 2.2 > >>> 0002 2.6 > >>> ",sep="",header=TRUE,colClass=c("character","numeric")) > >>> > dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_"))) > >>> > dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_"))) > >>> dat2 > >>> # ID Height UniqueID > >>> #1 0001 3.2 0001_1 > >>> #2 0001 2.6 0001_2 > >>> #3 0001 3.2 0001_3 > >>> #4 0002 2.2 0002_4 > >>> #5 0002 2.6 0002_5 > >>> A.K. > >>> > >>> > >>> > >>> ----- Original Message ----- > >>> From: Ye Lin <ye...@lbl.gov> > >>> To: R help <r-help@r-project.org> > >>> Cc: > >>> Sent: Tuesday, May 7, 2013 1:54 PM > >>> Subject: [R] create unique ID for each group > >>> > >>> Hey All, > >>> > >>> I have a dataset(dat1) like this: > >>> > >>> ObsNumber ID Weight > >>> 1 0001 12 > >>> 2 0001 13 > >>> 3 0001 14 > >>> 4 0002 16 > >>> 5 0002 17 > >>> > >>> And another dataset(dat2) like this: > >>> > >>> ID Height > >>> 0001 3.2 > >>> 0001 2.6 > >>> 0001 3.2 > >>> 0002 2.2 > >>> 0002 2.6 > >>> > >>> I want to merge dat1 and dat2 based on "ID" in order, I know "match" > only > >>> returns the first match it finds. So I am thinking create unique ID > col in > >>> dat2 and dat2, then merge. But I dont know how to do that so it can be > like > >>> this: > >>> > >>> dat1: > >>> > >>> ObsNumber ID Weight UniqueID > >>> 1 0001 12 0001_1 > >>> 2 0001 13 0001_2 > >>> 3 0001 14 0001_3 > >>> 4 0002 16 0002_1 > >>> 5 0002 17 0002_1 > >>> > >>> dat2: > >>> > >>> ID Height UniqueID > >>> 0001 3.2 0001_1 > >>> 0001 2.6 0001_2 > >>> 0001 3.2 0001_3 > >>> 0002 2.2 0002_1 > >>> 0002 2.6 0002_2 > >>> > >>> Or if it is possible to merge dat1 and dat2 by matching "ID" but > return the > >>> match in order that would be great! > >>> > >>> Thanks for your help! > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.