Re: [R] R help for creating expression data of Differentially expressed genes

Vivek Das Tue, 07 May 2013 15:15:47 -0700

HI Arun,

My data sets are as in the provided files. I am providing the sample files.
I guess this will give a better idea to the type of working I want to do
with the two files and the kind or script am trying to write. Hope you can
give me some suggestions regarding this. I am new to R so having trouble to
use different functions to use this for my working.


Anyone who can help me out with this can be of great help.


----------------------------------------------------------

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek....@ieo.eu
            vchris...@yahoo.co.in
            vd4mm...@gmail.com


On Tue, May 7, 2013 at 10:36 PM, arun <smartpink...@yahoo.com> wrote:

> Hi Vivek,
>
> May be this helps:
> set.seed(35)
>  dat1<- cbind(ID=1:8,
> as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7)))
>
> set.seed(38)
> dat2<- cbind(ID= sample(1:20,8,replace=FALSE),
> as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33)))
> colnames(dat2)[-1]<-gsub("V","X",colnames(dat2)[-1])
>  merge(dat1[,1:2],dat2[,1:31],by="ID")
> #  ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18
> X19 X20
> #1  1 43 44  4 33 47 29 43 31 15  2  34  42   5  18  22  36  34  44   3
> 45   9
> #2  3 28  4 18 45 24  5 20 30 16 49  34  33   5  24  49  31  10  45  21
> 26  20
> #3  6  5 16  1  5  2 26  6 40 16 15  50  26  37  22  25  39  16  24  29
> 50  42
> #4  7 25 26 39 16 29  5 40 15 27 46  16  38  36  42   8   3  29   7  13
> 18  38
> #5  8 30  3 41 25 38 24 41 44 23  2  45  33  10  18  20  49  19  23  42
> 25   5
> #  X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
> #1  14  27   3  21   6  44  33  42  10  29
> #2  48  13   8  47  18   9  23   9  44   3
> #3  25  14  31  19  14   6  26  13   6  49
> #4  43  28  15   6   9  19  43  21  41  21
> #5   1  27  18   3  42   5  16  39  46  47
> A.K.
>
>
>
> ----- Original Message -----
> From: Vivek Das <vd4mm...@gmail.com>
> To: arun <smartpink...@yahoo.com>
> Cc:
> Sent: Tuesday, May 7, 2013 3:45 PM
> Subject: R help for creating expression data of Differentially expressed
> genes
>
> Hi Arun,
>
> I need some help regarding R scripting. I have two data file one
> containing seven columns and the other containing 33. Both files have
> unique identifier as ID. I want to create another file which should have
> the first two columns of the first file and and the 31 columns of the
> second file matched on the basis of ID. The first file is having gene I'd
> and gene names of around 500 and I want the output file which is having all
> of those and other attributes as well. I want to get the output file having
> all attributes matching with the I'd of the first file. So that I get
> output of 500 rows with all the attributes of second file. I am new to R
> but having trouble with merge function in R. If you can help it will be
> great.
>
> Regards,
> Vivek
>
> Sent from my iPad
>
> On 07/mag/2013, at 21:13, arun <smartpink...@yahoo.com> wrote:
>
> > HI Ye,
> >
> > For the NA in ID column,
> >
> >
> >
> > Hi
> > dat1<- read.table(text="
> > ObsNumber     ID          Weight
> >      1                 0001         12
> >      2                 0001          13
> >      3                 0001           14
> >      4                  0002         16
> >       5                 0002         17
> >      6                   N/A          18
> >
> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
> >  unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> > #[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2"
> > A.K.
> > ________________________________
> > From: Ye Lin <ye...@lbl.gov>
> > To: arun <smartpink...@yahoo.com>
> > Cc: R help <r-help@r-project.org>
> > Sent: Tuesday, May 7, 2013 2:54 PM
> > Subject: Re: [R] create unique ID for each group
> >
> >
> >
> > Thanks A.K. But I have "NA" in ID column, so when I apply the code, it
> gives me error saying the replacement as less rows than the data has.
> Anyway for ID=N/A, return sth like "N/A_1" in order as well?
> >
> >
> >
> >
> >
> >
> > On Tue, May 7, 2013 at 11:17 AM, arun <smartpink...@yahoo.com> wrote:
> >
> > H,
> >> Sorry, a mistake:
> >> dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >> dat1
> >>  # ObsNumber   ID Weight UniqueID
> >> #1         1 0001     12   0001_1
> >> #2         2 0001     13   0001_2
> >> #3         3 0001     14   0001_3
> >> #4         4 0002     16   0002_1
> >> #5         5 0002     17   0002_2
> >>
> >> dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >>
> >> A.K.
> >>
> >>
> >>
> >>
> >>
> >> ----- Original Message -----
> >>
> >> From: arun <smartpink...@yahoo.com>
> >> To: Ye Lin <ye...@lbl.gov>
> >> Cc: R help <r-help@r-project.org>
> >> Sent: Tuesday, May 7, 2013 2:10 PM
> >> Subject: Re: [R] create unique ID for each group
> >>
> >>
> >>
> >> Hi,
> >>
> >> Try this:
> >> dat1<- read.table(text="
> >> ObsNumber     ID          Weight
> >>      1                 0001         12
> >>      2                 0001          13
> >>      3                 0001           14
> >>      4                  0002         16
> >>       5                 0002         17
> >> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"))
> >> dat2<- read.table(text="
> >> ID               Height
> >> 0001            3.2
> >> 0001             2.6
> >> 0001             3.2
> >> 0002             2.2
> >> 0002              2.6
> >> ",sep="",header=TRUE,colClass=c("character","numeric"))
> >>
> dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_")))
> >>
> dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_")))
> >>  dat2
> >> #    ID Height UniqueID
> >> #1 0001    3.2   0001_1
> >> #2 0001    2.6   0001_2
> >> #3 0001    3.2   0001_3
> >> #4 0002    2.2   0002_4
> >> #5 0002    2.6   0002_5
> >> A.K.
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: Ye Lin <ye...@lbl.gov>
> >> To: R help <r-help@r-project.org>
> >> Cc:
> >> Sent: Tuesday, May 7, 2013 1:54 PM
> >> Subject: [R] create unique ID for each group
> >>
> >> Hey All,
> >>
> >> I have a dataset(dat1) like this:
> >>
> >> ObsNumber     ID          Weight
> >>      1                 0001         12
> >>      2                 0001          13
> >>      3                 0001           14
> >>      4                  0002         16
> >>       5                 0002         17
> >>
> >> And another dataset(dat2) like this:
> >>
> >> ID               Height
> >> 0001            3.2
> >> 0001             2.6
> >> 0001             3.2
> >> 0002             2.2
> >> 0002              2.6
> >>
> >> I want to merge dat1 and dat2 based on "ID" in order, I know "match"
> only
> >> returns the first match it finds. So I am thinking create unique ID col
> in
> >> dat2 and dat2, then merge. But I dont know how to do that so it can be
> like
> >> this:
> >>
> >> dat1:
> >>
> >> ObsNumber     ID          Weight  UniqueID
> >>      1                 0001         12         0001_1
> >>      2                 0001          13        0001_2
> >>      3                 0001           14       0001_3
> >>      4                  0002         16         0002_1
> >>       5                 0002         17         0002_1
> >>
> >> dat2:
> >>
> >> ID               Height   UniqueID
> >> 0001            3.2          0001_1
> >> 0001             2.6         0001_2
> >> 0001             3.2         0001_3
> >> 0002             2.2         0002_1
> >> 0002              2.6        0002_2
> >>
> >> Or if it is possible to merge dat1 and dat2 by matching "ID" but return
> the
> >> match in order that would be great!
> >>
> >> Thanks for your help!
> >>
> >>     [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

ID      test_ID gene    locus   Sample_118p_0   Sample_118rp3_0 Sample_118rz_0  
Sample_118z_0   Sample_132p1_0  Sample_132p2_0  Sample_132p3_0  Sample_132rp1_0 
Sample_132rp3_0 Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0 Sample_132z_0   
Sample_141p1_0  Sample_141p2_0  Sample_141p3_0  Sample_141p4_0  Sample_141z_0   
Sample_183p1_0  Sample_183p2_0  Sample_183p3_0  Sample_183z_0   Sample_91p_0    
Sample_91rp1_0  Sample_91rp3_0  Sample_91rp4_0  Sample_91rz_0
XLOC_000009     XLOC_025681     NEFL    chr8:24808468-24814131  0       0       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0       0
XLOC_000010     XLOC_025681     NEFL    chr8:24808468-24814131  0       0       
0.29217 0.270976        0.126338        0       0       0.464747        
0.596984        0.199851        0.892021        0.863341        2.91729 0       
0.226087        0       0       2.1632  0.356073        0.655415        0       
1.1598  0.385098        0.718336        0.187613        0.34955 0.498937
XLOC_000011     XLOC_022130     "HLA-DRB1,HLA-DRB5"     chr6:32441213-32557613  
3.59279 9.09855 2.57678 1.59323 16.9363 4.47379 6.8702  6.92243 21.7622 7.46156 
4.42057 3.34178 15.4373 5.21231 3.85498 2.53136 6.18972 4.83315 6.90879 12.5242 
5.96035 3.40959 8.60407 15.9087 8.16287 9.35126 6.01379
XLOC_000012     XLOC_003321     CCDC3   chr10:12938624-13043704 0       0       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0       0       
0       0.581209        0.455395        0       0
XLOC_000013     XLOC_005027     CD248   chr11:66081957-66084515 0.248183        
0.234721        0.145036        0.0538057       0.288489        0.120182        
0.138705        0.138422        0.474156        0.297623        0.177122        
0.149999        0.537889        0.0951497       0.112231        0.0610627       
0.134862        0.257719        0.212109        0.325353        0.0387095       
0.191911        0.229399        0.332815        0.0745058       0.225575        
0.198141
XLOC_000014     XLOC_021040     STC2    chr5:172741725-172756506        0       
0       0       0       0       0       0       0.0364255       0.0701849       
0       0       0       0.0979922       0       0       0       0       
0.101727        0       0       0       0       0       0       0       
0.0410951       0.0586578

ID      test_ID gene    locus   sample_1        sample_2        status  value_1 
value_2 log2(fold_change)       test_stat       p_value q_value significant
XLOC_000009     XLOC_025681     NEFL    chr8:24808468-24814131  Sample_118p     
Sample_118rp3   OK      0.14678 84.3686 9.1669  -4.83529        1.33E-06        
0.0261296       yes
XLOC_000010     XLOC_025681     NEFL    chr8:24808468-24814131  Sample_118p     
Sample_118z     OK      0.14678 64.1788 8.77229 -4.63808        3.52E-06        
0.0401193       yes
XLOC_000011     XLOC_022130     "HLA-DRB1,HLA-DRB5"     chr6:32441213-32557613  
Sample_118rz    Sample_118z     OK      3.18746 9.29E+06        21.4749 
-5.75217        8.81E-09        0.00280103      yes
XLOC_000012     XLOC_003321     CCDC3   chr10:12938624-13043704 Sample_118p     
Sample_132p1    OK      0.0184144       83.7839 12.1516 -4.77738        
1.78E-06        0.0288706       yes
XLOC_000013     XLOC_005027     CD248   chr11:66081957-66084515 Sample_118p     
Sample_132p1    OK      0.280334        216.614 9.59377 -5.10742        
3.27E-07        0.0159446       yes
XLOC_000014     XLOC_021040     STC2    chr5:172741725-172756506        
Sample_118p     Sample_132p1    OK      0.187273        69.3633 8.53289 
-4.73246        2.22E-06        0.0320926       yes

ID      Sample_118p_0   Sample_118rp3_0 Sample_118rz_0  Sample_118z_0   
Sample_132p1_0  Sample_132p2_0  Sample_132p3_0  Sample_132rp1_0 Sample_132rp3_0 
Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0 Sample_132z_0   Sample_141p1_0  
Sample_141p2_0  Sample_141p3_0  Sample_141p4_0  Sample_141z_0   Sample_183p1_0  
Sample_183p2_0  Sample_183p3_0  Sample_183z_0   Sample_91p_0    Sample_91rp1_0  
Sample_91rp3_0  Sample_91rp4_0  Sample_91rz_0
XLOC_000001     112.474 166.179 81.5227 44.7787 301.154 118.827 144.47  170.407 
406.899 189.131 97.1834 72.739  386.81  86.966  85.7031 53.01   158.314 145.843 
219.667 240.231 127.42  78.5814 179.324 297.395 203.55  251.538 110.898
XLOC_000002     13.7609 17.7673 11.911  6.2906  39.1648 14.8832 30.0239 42.7172 
88.8146 23.3105 15.4408 7.47508 40.3511 12.6166 12.7373 10.9697 28.2655 22.6594 
27.2177 27.8328 18.213  7.8803  22.6769 28.9456 18.7493 22.7607 15.679
XLOC_000003     62.1301 102.162 748.313 273.52  242.685 94.2888 161.228 225.243 
497.011 160.376 896.121 465.496 2330.57 72.3527 73.9626 71.3686 203.201 1048.81 
172.241 183.26  98.1168 473.464 117.368 174.073 119.605 122.661 754.735
XLOC_000004     4.16261 5.71899 4.55739 2.48634 9.11917 3.49082 3.49611 4.97502 
12.5986 6.38753 4.94983 4.81898 18.2275 3.22435 2.07446 1.97518 4.05074 8.86568 
5.11854 6.4147  4.65076 4.37495 6.36026 9.22755 6.65625 8.8201  7.17221
XLOC_000005     0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0
XLOC_000006     0       0.103125        0       0       0       0.0829754       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0.15724 0       0       0       0.11489 0.0900197       
0       0
XLOC_000007     0.0282754       0.0218796       0       0       0.0385837       
0       0.0129295       0.0315409       0.0303866       0       0       0       
0       0       0       0       0       0       0       0.0333607       
0.0396915       0       0.0392031       0       0       0       0
XLOC_000008     0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0
XLOC_000009     0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0
XLOC_000010     0       0       0.29217 0.270976        0.126338        0       
0       0.464747        0.596984        0.199851        0.892021        
0.863341        2.91729 0       0.226087        0       0       2.1632  
0.356073        0.655415        0       1.1598  0.385098        0.718336        
0.187613        0.34955 0.498937
XLOC_000011     3.59279 9.09855 2.57678 1.59323 16.9363 4.47379 6.8702  6.92243 
21.7622 7.46156 4.42057 3.34178 15.4373 5.21231 3.85498 2.53136 6.18972 4.83315 
6.90879 12.5242 5.96035 3.40959 8.60407 15.9087 8.16287 9.35126 6.01379
XLOC_000012     0       0       0       0       0       0       0       0       
0       0       0       0       0       0       0       0       0       0       
0       0       0       0       0       0.581209        0.455395        0       0
XLOC_000013     0.248183        0.234721        0.145036        0.0538057       
0.288489        0.120182        0.138705        0.138422        0.474156        
0.297623        0.177122        0.149999        0.537889        0.0951497       
0.112231        0.0610627       0.134862        0.257719        0.212109        
0.325353        0.0387095       0.191911        0.229399        0.332815        
0.0745058       0.225575        0.198141
XLOC_000014     0       0       0       0       0       0       0       
0.0364255       0.0701849       0       0       0       0.0979922       0       
0       0       0       0.101727        0       0       0       0       0       
0       0       0.0410951       0.0586578

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help for creating expression data of Differentially expressed genes

Reply via email to