Hi Vivek, I removed the rows with missing values and also duplicated rows. Now, it looks like it is working.
x<-read.table("RP_matrix_FPKM_PGTvsPDGT.txt",header=T,sep="\t") x1<- read.table("RP_plaise_FPKM_PGTvsPDGT.txt",header=T,sep="\t") str(x1) #'data.frame': 19680 obs. of 6 variables: # $ ID : Factor w/ 19678 levels "XLOC_000001",..: 1 2 3 4 5 6 7 8 9 10 ... # $ PGT.1 : num 112.47 13.76 62.13 4.16 0 ... # $ PGT.0 : num 118.83 14.88 94.29 3.49 0 ... # $ PGT.2 : num 179.324 22.677 117.368 6.36 0.385 ... # $ PDGT.0: num 301.154 39.165 242.685 9.119 0.126 ... # $ PDGT.1: num 144.5 30 161.2 3.5 0 ... str(x) #'data.frame': 28599 obs. of 6 variables: # $ gene : Factor w/ 28599 levels "XLOC_000001",..: 1 2 3 4 5 6 7 8 9 10 ... # $ PGT.1 : num 71.25 8.71 14.6 1.99 0 ... # $ PGT.0 : num 68.36 8.16 9.75 2.4 0 ... # $ PGT.2 : num 108.17 13.35 18.29 3.64 0 ... # $ PDGT.0: num 195.01 24.76 40.59 5.61 0 ... # $ PDGT.1: num 93.06 18.88 26.83 2.14 0 ... length(unique(x[,1])) #[1] 28599 length(unique(x1[,1])) #[1] 19679 x2<- x1[-which(duplicated(x1[,1])),] dim(x2) #[1] 19679 6 x3<- na.omit(x2) dim(x3) #[1] 19678 6 cl<-c(rep(0,3),rep(1,2)) origin<-c(rep(1,5)) library(RankProd) RP.out <- RPadvance(x3[,-1],cl,origin,gene.names=as.character(x3[,1]),num.perm=200) A.K. ________________________________ From: Vivek Das <vd4mm...@gmail.com> To: arun <smartpink...@yahoo.com> Sent: Tuesday, August 6, 2013 9:38 AM Subject: Re: Problem with t-test No I have tried it again on other files and the error is not there it works fine.. its a new file I have created, I am sending you the script and the file which I am using, its a non fussy script I created and worked multiples times with other files, I am sending you 2 different input files where in one it works in the other it does not. With the files plaise its not working but with the other input file its working. library(RankProd) x<-read.table("RP_matrix_RF_PGTvsPDGT.txt",header=T,sep="\t") cl<-c(rep(0,3),rep(1,2)) origin<-c(rep(1,5)) RP.out <- RPadvance(x[,-1],cl,origin,gene.names=x[,1],num.perm=200) topGene(RP.out,cutoff = 0.1) #plotRP(RP.out, cutoff = 0.1) table=topGene(RP.out,cutoff=0.1,method="pfp") t1<-table$Table1 t2<-table$Table2 ind1<-which(t1[,4]<0.1) ind2<-which(t2[,4]<0.1) up<-t1[ind1,] down<-t2[ind2,] degs<-rbind(up,down) ---------------------------------------------------------- Vivek Das PhD Student in Computational Biology Giuseppe Testa's Lab European School of Molecular Medicine IFOM-IEO Campus Via Adamello, 16 Milan, Italy emails: vivek....@ieo.eu vchris...@yahoo.co.in vd4mm...@gmail.com On Tue, Aug 6, 2013 at 3:17 PM, arun <smartpink...@yahoo.com> wrote: HI Vivek, >I never used RankProd before. So, can't guarantee if I can sort the problem. >But, you can send me the file and the script. I will try it later. >As you mentioned that RankProd worked before, is it on the same file or a >different file. If it is the latter, then try running it on that file and see >if the error repeats. > > > > > > > > >________________________________ >From: Vivek Das <vd4mm...@gmail.com> >To: arun <smartpink...@yahoo.com> >Sent: Tuesday, August 6, 2013 9:09 AM > >Subject: Re: Problem with t-test > > > >Yes, I know this but am worried about the consistency of the data then as it >will remove a lot of observations and so the results will not be good infact I >tested it and am not getting p value as I expected. Anyways I am doing another >test which is a RankProd package in R. I am encountering a problem here, I >have used this package multiple number of times but have never faced this , do >you have any idea when do we get the below error? > >Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate >'row.names' are not allowed In addition: Warning message: non-unique values >when setting 'row.names': ‘’ in rankprod. > > >I am not being able to understand the duplicate'row.names' option as these are >gene location on the row with values of expression and the locations are >duplicate more than 2-3 times , I have used such data frame earlier as well to >compute the RankProd and they worked. But now I am getting some error. I can >share the script and the file with you if you need as the pipeline for >RankProd is very easy to execute. > >If you can give me some idea about the error it will be good. > > >---------------------------------------------------------- > >Vivek Das >PhD Student in Computational Biology >Giuseppe Testa's Lab >European School of Molecular Medicine >IFOM-IEO Campus >Via Adamello, 16 >Milan, Italy > >emails: vivek....@ieo.eu > vchris...@yahoo.co.in > vd4mm...@gmail.com > > > >On Tue, Aug 6, 2013 at 3:01 PM, arun <smartpink...@yahoo.com> wrote: > >Hi Vivek, >>No problem. >>?t.test >>na.action: a function which indicates what should happen when the data >> contain ‘NA’s. Defaults to ‘getOption("na.action")’. >> >>In my system, >> >>getOption("na.action") >>#[1] "na.omit" >> >> >>So, it removes the NA's by default and reduce the number of observations. >> >> >> >>________________________________ >>From: Vivek Das <vd4mm...@gmail.com> >>To: arun <smartpink...@yahoo.com> >>Sent: Tuesday, August 6, 2013 8:52 AM >>Subject: Re: Problem with t-test >> >> >> >> >>yes actually I just tested few conditions and found that there are NaN values >>and so this problem is happening.. I cannot proceed with this test and have >>to change the pipeline with some other R package for my analysis. Thanks for >>your input. >> >> >>---------------------------------------------------------- >> >>Vivek Das >>PhD Student in Computational Biology >>Giuseppe Testa's Lab >>European School of Molecular Medicine >>IFOM-IEO Campus >>Via Adamello, 16 >>Milan, Italy >> >>emails: vivek....@ieo.eu >> vchris...@yahoo.co.in >> vd4mm...@gmail.com >> >> >> >>On Tue, Aug 6, 2013 at 2:42 PM, arun <smartpink...@yahoo.com> wrote: >> >>HI Vivek, >>>It looks like the number of observations in each test are 2 (PDGT) and 3 >>>respectively. It could be possible that some of the entries are NA, and >>>therefore, the observation number is low to produce the error. It's just a >>>guess as this is not a reproducible example. >>> >>> >>> >>> >>> >>> >>> >>>________________________________ >>>From: Vivek Das <vd4mm...@gmail.com> >>>To: arun <smartpink...@yahoo.com> >>>Sent: Tuesday, August 6, 2013 4:29 AM >>>Subject: Problem with t-test >>> >>> >>> >>> >>>data<- >>>read.table("/Users/vdas/Documents/RNA-Seq_Smaples_Udine_08032013/GBM_29052013/UD_RP_25072013/filteredFPKM_matrix.txt",sep="",header=TRUE,stringsAsFactors=FALSE) >>>> head(data) >>> ID Sample_118p Sample_118rp3 Sample_118rz Sample_118z >>>Sample_132p1 Sample_132p2 Sample_132p3 Sample_132rp1 Sample_132rp3 >>>Sample_132rp4 Sample_132rz1 >>>1 XLOC_000001 112.47400 166.17900 81.52270 44.778700 >>>301.154000 118.82700 144.47000 170.407000 406.899000 >>>189.131000 97.183400 >>>2 XLOC_000002 13.76090 17.76730 11.91100 6.290600 >>>39.164800 14.88320 30.02390 42.717200 88.814600 >>>23.310500 15.440800 >>>3 XLOC_000003 62.13010 102.16200 748.31300 273.520000 >>>242.685000 94.28880 161.22800 225.243000 497.011000 >>>160.376000 896.121000 >>>4 XLOC_000004 4.16261 5.71899 4.55739 2.486340 >>>9.119170 3.49082 3.49611 4.975020 12.598600 6.387530 >>> 4.949830 >>>5 XLOC_000010 0.00000 0.00000 0.29217 0.270976 >>>0.126338 0.00000 0.00000 0.464747 0.596984 0.199851 >>> 0.892021 >>>6 XLOC_000011 3.59279 9.09855 2.57678 1.593230 >>>16.936300 4.47379 6.87020 6.922430 21.762200 >>>7.461560 4.420570 >>> Sample_132rz2 Sample_132z Sample_141p1 Sample_141p2 Sample_141p3 >>>Sample_141p4 Sample_141z Sample_183p1 Sample_183p2 Sample_183p3 Sample_183z >>>Sample_91p >>>1 72.739000 386.81000 86.96600 85.703100 53.01000 >>>158.31400 145.84300 219.667000 240.231000 127.42000 78.58140 >>>179.324000 >>>2 7.475080 40.35110 12.61660 12.737300 10.96970 >>>28.26550 22.65940 27.217700 27.832800 18.21300 7.88030 >>>22.676900 >>>3 465.496000 2330.57000 72.35270 73.962600 71.36860 >>>203.20100 1048.81000 172.241000 183.260000 98.11680 473.46400 >>>117.368000 >>>4 4.818980 18.22750 3.22435 2.074460 1.97518 >>>4.05074 8.86568 5.118540 6.414700 4.65076 4.37495 >>>6.360260 >>>5 0.863341 2.91729 0.00000 0.226087 0.00000 >>>0.00000 2.16320 0.356073 0.655415 0.00000 1.15980 >>>0.385098 >>>6 3.341780 15.43730 5.21231 3.854980 2.53136 >>>6.18972 4.83315 6.908790 12.524200 5.96035 3.40959 >>>8.604070 >>> Sample_91rp1 Sample_91rp3 Sample_91rp4 Sample_91rz >>>1 297.395000 203.550000 251.53800 110.898000 >>>2 28.945600 18.749300 22.76070 15.679000 >>>3 174.073000 119.605000 122.66100 754.735000 >>>4 9.227550 6.656250 8.82010 7.172210 >>>5 0.718336 0.187613 0.34955 0.498937 >>>6 15.908700 8.162870 9.35126 6.013790 >>>> PGT<-cbind(data[,2],data[,7],data[,24]) >>>> head(PGT) >>> [,1] [,2] [,3] >>>[1,] 112.47400 118.82700 179.324000 >>>[2,] 13.76090 14.88320 22.676900 >>>[3,] 62.13010 94.28880 117.368000 >>>[4,] 4.16261 3.49082 6.360260 >>>[5,] 0.00000 0.00000 0.385098 >>>[6,] 3.59279 4.47379 8.604070 >>>> PDGT<-cbind(data[,6],data[,8]) >>> >>>pval2<-NULL >>>> for(i in 1:length(PGT[,1])){ >>>+ pval2<-c(pval2,t.test(as.numeric(PDGT[i,]),as.numeric(PGT[i,]))$p.value) >>>+ print(i) >>>+ } >>> >>>Error: >>>Error in t.test.default(as.numeric(PDGT[i, ]), as.numeric(PGT[i, ])) : >>> not enough 'x' observations >>> >>>I cannot understand what went wrong with the vector . Can you please tell >>>me? I am not being able to figure it out >>>---------------------------------------------------------- >>> >>>Vivek Das >>>PhD Student in Computational Biology >>>Giuseppe Testa's Lab >>>European School of Molecular Medicine >>>IFOM-IEO Campus >>>Via Adamello, 16 >>>Milan, Italy >>> >>>emails: vivek....@ieo.eu >>> vchris...@yahoo.co.in >>> vd4mm...@gmail.com >>> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.