Re: [R] R help for creating expression data of Differentially expressed genes

Vivek Das Wed, 08 May 2013 01:57:56 -0700

Hi Arun,

 I am still facing trouble as I can see the data output is identical for
all rows when I am using this merge function. It seems that since in my
data2 which I have provided I have not given you the exact genes I have.
There are likely to be repeatations of ID in both the files but the problem
now is that when the loop is running for the merge on basis of ID it is
printing on the rows for which the ID's are repeated more than once and
which does not have any repeatations in the data.txt they are being
ignored. Say I had a locus XLOC_002126 which is the ID and its repeated for
10 times in the data1.txt so when my merging is taking place its only
working on those ID's which are repeated more than once in data1.txt and
merging them with their respective attributes. And this is happening for
the number of times it is being repeated in data1.txt and the next data on
which it merges is also the same for which in data1.txt we have more
repeats for ID. Here is an example of the output I am getting below.




  ID test_id gene locus Sample_118p_0 Sample_118rp3_0 Sample_118rz_0
Sample_118z_0 Sample_132p1_0 Sample_132p2_0 Sample_132p3_0 Sample_132rp1_0
Sample_132rp3_0 Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0
Sample_132z_0 Sample_141p1_0 Sample_141p2_0 Sample_141p3_0 Sample_141p4_0
Sample_141z_0 Sample_183p1_0 Sample_183p2_0 Sample_183p3_0 Sample_183z_0
Sample_91p_0 Sample_91rp1_0 Sample_91rp3_0 Sample_91rp4_0 Sample_91rz_0
XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882
0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938
0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457
0.692972 1.42658 1280.78 76.5147 4.67875 468.667  XLOC_002126 XLOC_002126
MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295
1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667

----------------------------------------------------------

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek....@ieo.eu
            vchris...@yahoo.co.in
            vd4mm...@gmail.com


On Wed, May 8, 2013 at 12:35 AM, arun <smartpink...@yahoo.com> wrote:

> HI,
> Assuming that "out_dat.txt" is the output you expected.
>
>
>  dat1<- read.table("data1.txt",header=TRUE,stringsAsFactors=FALSE)
> dat2<- read.table("data2.txt",header=TRUE,stringsAsFactors=FALSE)
> out_dat<- read.table("out_data.txt",header=TRUE,stringsAsFactors=FALSE)
>  out_dat2<-merge(dat1[,1:4],dat2,by="ID")
>  identical(out_dat,out_dat2)
> #[1] TRUE
> A.K.
>
>
>
>
>
> ________________________________
> From: Vivek Das <vd4mm...@gmail.com>
> To: arun <smartpink...@yahoo.com>
> Cc: R help <r-help@r-project.org>
> Sent: Tuesday, May 7, 2013 6:07 PM
> Subject: Re: R help for creating expression data of Differentially
> expressed genes
>
>
>
> HI Arun,
>
> My data sets are as in the provided files. I am providing the sample
> files. I guess this will give a better idea to the type of working I want
> to do with the two files and the kind or script am trying to write. Hope
> you can give me some suggestions regarding this. I am new to R so having
> trouble to use different functions to use this for my working.
>
> Anyone who can help me out with this can be of great help.
>
>
>
> ----------------------------------------------------------
>
> Vivek Das
> PhD Student in Computational Biology
> Giuseppe Testa's Lab
> European School of Molecular Medicine
> IFOM-IEO Campus
> Via Adamello, 16
> Milan, Italy
>
> emails: vivek....@ieo.eu
>             vchris...@yahoo.co.in
>             vd4mm...@gmail.com
>
>
>
> On Tue, May 7, 2013 at 10:36 PM, arun <smartpink...@yahoo.com> wrote:
>
> Hi Vivek,
> >
> >May be this helps:
> >set.seed(35)
> > dat1<- cbind(ID=1:8,
> as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7)))
> >
> >set.seed(38)
> >dat2<- cbind(ID= sample(1:20,8,replace=FALSE),
> as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33)))
> >colnames(dat2)[-1]<-gsub("V","X",colnames(dat2)[-1])
> > merge(dat1[,1:2],dat2[,1:31],by="ID")
> >#  ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18
> X19 X20
> >#1  1 43 44  4 33 47 29 43 31 15  2  34  42   5  18  22  36  34  44   3
> 45   9
> >#2  3 28  4 18 45 24  5 20 30 16 49  34  33   5  24  49  31  10  45  21
> 26  20
> >#3  6  5 16  1  5  2 26  6 40 16 15  50  26  37  22  25  39  16  24  29
> 50  42
> >#4  7 25 26 39 16 29  5 40 15 27 46  16  38  36  42   8   3  29   7  13
> 18  38
> >#5  8 30  3 41 25 38 24 41 44 23  2  45  33  10  18  20  49  19  23  42
> 25   5
> >#  X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
> >#1  14  27   3  21   6  44  33  42  10  29
> >#2  48  13   8  47  18   9  23   9  44   3
> >#3  25  14  31  19  14   6  26  13   6  49
> >#4  43  28  15   6   9  19  43  21  41  21
> >#5   1  27  18   3  42   5  16  39  46  47
> >
> >A.K.
> >
> >
> >
> >----- Original Message -----
> >
> >From: Vivek Das <vd4mm...@gmail.com>
> >To: arun <smartpink...@yahoo.com>
> >Cc:
> >
> >Sent: Tuesday, May 7, 2013 3:45 PM
> >Subject: R help for creating expression data of Differentially expressed
> genes
> >
> >Hi Arun,
> >
> >I need some help regarding R scripting. I have two data file one
> containing seven columns and the other containing 33. Both files have
> unique identifier as ID. I want to create another file which should have
> the first two columns of the first file and and the 31 columns of the
> second file matched on the basis of ID. The first file is having gene I'd
> and gene names of around 500 and I want the output file which is having all
> of those and other attributes as well. I want to get the output file having
> all attributes matching with the I'd of the first file. So that I get
> output of 500 rows with all the attributes of second file. I am new to R
> but having trouble with merge function in R. If you can help it will be
> great.
> >
> >Regards,
> >Vivek
> >
> >Sent from my iPad
> >
> >On 07/mag/2013, at 21:13, arun <smartpink...@yahoo.com> wrote:
> >
> >> HI Ye,
> >>
> >> For the NA in ID column,
> >>
> >>
> >>
> >> Hi
> >> dat1<- read.table(text="
> >> ObsNumber     ID          Weight
> >>      1                 0001         12
> >>      2                 0001          13
> >>      3                 0001           14
> >>      4                  0002         16
> >>       5                 0002         17
> >>      6                   N/A          18
> >>
> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
> >>  unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >> #[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2"
> >> A.K.
> >> ________________________________
> >> From: Ye Lin <ye...@lbl.gov>
> >> To: arun <smartpink...@yahoo.com>
> >> Cc: R help <r-help@r-project.org>
> >> Sent: Tuesday, May 7, 2013 2:54 PM
> >> Subject: Re: [R] create unique ID for each group
> >>
> >>
> >>
> >> Thanks A.K. But I have "NA" in ID column, so when I apply the code, it
> gives me error saying the replacement as less rows than the data has.
> Anyway for ID=N/A, return sth like "N/A_1" in order as well?
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, May 7, 2013 at 11:17 AM, arun <smartpink...@yahoo.com> wrote:
> >>
> >> H,
> >>> Sorry, a mistake:
> >>> dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >>> dat1
> >>>  # ObsNumber   ID Weight UniqueID
> >>> #1         1 0001     12   0001_1
> >>> #2         2 0001     13   0001_2
> >>> #3         3 0001     14   0001_3
> >>> #4         4 0002     16   0002_1
> >>> #5         5 0002     17   0002_2
> >>>
> >>> dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >>>
> >>> A.K.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: arun <smartpink...@yahoo.com>
> >>> To: Ye Lin <ye...@lbl.gov>
> >>> Cc: R help <r-help@r-project.org>
> >>> Sent: Tuesday, May 7, 2013 2:10 PM
> >>> Subject: Re: [R] create unique ID for each group
> >>>
> >>>
> >>>
> >>> Hi,
> >>>
> >>> Try this:
> >>> dat1<- read.table(text="
> >>> ObsNumber     ID          Weight
> >>>      1                 0001         12
> >>>      2                 0001          13
> >>>      3                 0001           14
> >>>      4                  0002         16
> >>>       5                 0002         17
> >>> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"))
> >>> dat2<- read.table(text="
> >>> ID               Height
> >>> 0001            3.2
> >>> 0001             2.6
> >>> 0001             3.2
> >>> 0002             2.2
> >>> 0002              2.6
> >>> ",sep="",header=TRUE,colClass=c("character","numeric"))
> >>>
> dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_")))
> >>>
> dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_")))
> >>>  dat2
> >>> #    ID Height UniqueID
> >>> #1 0001    3.2   0001_1
> >>> #2 0001    2.6   0001_2
> >>> #3 0001    3.2   0001_3
> >>> #4 0002    2.2   0002_4
> >>> #5 0002    2.6   0002_5
> >>> A.K.
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: Ye Lin <ye...@lbl.gov>
> >>> To: R help <r-help@r-project.org>
> >>> Cc:
> >>> Sent: Tuesday, May 7, 2013 1:54 PM
> >>> Subject: [R] create unique ID for each group
> >>>
> >>> Hey All,
> >>>
> >>> I have a dataset(dat1) like this:
> >>>
> >>> ObsNumber     ID          Weight
> >>>      1                 0001         12
> >>>      2                 0001          13
> >>>      3                 0001           14
> >>>      4                  0002         16
> >>>       5                 0002         17
> >>>
> >>> And another dataset(dat2) like this:
> >>>
> >>> ID               Height
> >>> 0001            3.2
> >>> 0001             2.6
> >>> 0001             3.2
> >>> 0002             2.2
> >>> 0002              2.6
> >>>
> >>> I want to merge dat1 and dat2 based on "ID" in order, I know "match"
> only
> >>> returns the first match it finds. So I am thinking create unique ID
> col in
> >>> dat2 and dat2, then merge. But I dont know how to do that so it can be
> like
> >>> this:
> >>>
> >>> dat1:
> >>>
> >>> ObsNumber     ID          Weight  UniqueID
> >>>      1                 0001         12         0001_1
> >>>      2                 0001          13        0001_2
> >>>      3                 0001           14       0001_3
> >>>      4                  0002         16         0002_1
> >>>       5                 0002         17         0002_1
> >>>
> >>> dat2:
> >>>
> >>> ID               Height   UniqueID
> >>> 0001            3.2          0001_1
> >>> 0001             2.6         0001_2
> >>> 0001             3.2         0001_3
> >>> 0002             2.2         0002_1
> >>> 0002              2.6        0002_2
> >>>
> >>> Or if it is possible to merge dat1 and dat2 by matching "ID" but
> return the
> >>> match in order that would be great!
> >>>
> >>> Thanks for your help!
> >>>
> >>>     [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help for creating expression data of Differentially expressed genes

Reply via email to