Hi Jim; I do apologize if bothering. I have run on the real data and here is the error message I got:
Thanks, Greg start end Error in regrange[1]:regrange[2] : result would be too long a vector In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf On Mon, Jun 13, 2016 at 10:28 AM, greg holly <mak.hho...@gmail.com> wrote: > Hi Jim; > > I do apologize if bothering. I have run on the real data and here is the > error message I got: > > Thanks, > > Greg > > start end > Error in regrange[1]:regrange[2] : result would be too long a vector > In addition: Warning messages: > 1: In min(x) : no non-missing arguments to min; returning Inf > 2: In max(x) : no non-missing arguments to max; returning -Inf > > > On Mon, Jun 13, 2016 at 3:19 AM, Jim Lemon <drjimle...@gmail.com> wrote: > >> Hi Greg, >> Okay, I have a better idea now of what you want. The problem of >> multiple matches is still there, but here is a start: >> >> # this data frame actually contains all the values in ref in the "reg" >> field >> map<-read.table(text="reg p rate >> 10276 0.700 3.867e-18 >> 71608 0.830 4.542e-16 >> 29220 0.430 1.948e-15 >> 99542 0.220 1.084e-15 >> 26441 0.880 9.675e-14 >> 95082 0.090 7.349e-13 >> 36169 0.480 9.715e-13 >> 55572 0.500 9.071e-12 >> 65255 0.300 1.688e-11 >> 51960 0.970 1.163e-10 >> 55652 0.388 3.750e-10 >> 63933 0.250 9.128e-10 >> 35170 0.720 7.355e-09 >> 06491 0.370 1.634e-08 >> 85508 0.470 1.057e-07 >> 86666 0.580 7.862e-07 >> 04758 0.810 9.501e-07 >> 06169 0.440 1.104e-06 >> 63933 0.750 2.624e-06 >> 41838 0.960 8.119e-06 >> 74806 0.810 9.501e-07 >> 92643 0.470 1.057e-07 >> 73732 0.090 7.349e-13 >> 82451 0.960 8.119e-06 >> 86042 0.480 9.715e-13 >> 93502 0.500 9.071e-12 >> 85508 0.370 1.634e-08 >> 95082 0.830 4.542e-16", >> header=TRUE) >> # same as in your example >> ref<-read.table(text="reg1 reg2 >> 29220 63933 >> 26441 41838 >> 06169 10276 >> 74806 92643 >> 73732 82451 >> 86042 93502 >> 85508 95082", >> header=TRUE) >> # sort the "map" data frame >> map2<-map[order(map$reg),] >> # get a field for the counts >> ref$n<-NA >> # and a field for the minimum p values >> ref$min_p<-NA >> # get the number of rows in "ref" >> nref<-dim(ref)[1] >> for(i in 1:nref) { >> start<-which(map2$reg==ref$reg1[i]) >> end<-which(map2$reg==ref$reg2[i]) >> cat("start",start,"end",end,"\n") >> # get the range of matches >> regrange<-range(c(start,end)) >> # convert this to a sequence spanning all matches >> allreg<-regrange[1]:regrange[2] >> ref$n[i]<-sum(map2$p[allreg] > 0.85) >> ref$min_p[i]<-min(map2$p[allreg]) >> } >> >> This example uses the span from the first match of "reg1" to the last >> match of "reg2". This may not be what you want, so let me know if >> there are further constraints. >> >> Jim >> >> On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hho...@gmail.com> >> wrote: >> > Hi Bert; >> > >> > I do appreciate for this. I need check your codes on task2 tomorrow at >> my >> > office on the real data as I have difficulty (because a technical >> issue) to >> > remote connection. I am sure it will work well. >> > >> > I am sorry that I was not able to explain my first question. Basically >> > >> > Values in ref data represent the region of chromosome. I need choose >> these >> > regions in map (all regions values in ref data are exist in map data in >> the >> > first column -column map$reg). And then summing up the column "map$rate >> and >> > count the numbers that gives >0.85. For example, consider the first >> row in >> > data ref. They are 29220 and 63933. After sorting the first column in >> > map then summing column "map$rate" only between 29220 to 63933 in >> sorted >> > map and cut off at >0.85. Then count how many rows in sorted map gives >> >>0.85. For example consider there are 38 rows between 29220 in 63933 >> in sorted >> > map$reg and only summing first 12 of them gives>0.85. Then my answer is >> > going to be 12 for 29220 - 63933 in ref. >> > >> > Thanks I lot for your patience. >> > >> > Cheers, >> > Greg >> > >> > On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hho...@gmail.com> >> wrote: >> > >> >> Hi Bert; >> >> >> >> I do appreciate for this. I need check your codes on task2 tomorrow at >> my >> >> office on the real data as I have difficulty (because a technical >> issue) to >> >> remote connection. I am sure it will work well. >> >> >> >> I am sorry that I was not able to explain my first question. Basically >> >> >> >> Values in ref data represent the region of chromosome. I need choose >> these >> >> regions in map (all regions values in ref data are exist in map data >> in the >> >> first column -column map$reg). And then summing up the column >> "map$rate and >> >> count the numbers that gives >0.85. For example, consider the first >> row in >> >> data ref. They are 29220 and 63933. After sorting the first column >> in >> >> map then summing column "map$rate" only between 29220 to 63933 in >> >> sorted map and cut off at >0.85. Then count how many rows in sorted map >> >> gives >0.85. For example consider there are 38 rows between 29220 in >> >> 63933 in sorted map$reg and only summing first 12 of them gives>0.85. >> >> Then my answer is going to be 12 for 29220 - 63933 in ref. >> >> >> >> Thanks I lot for your patience. >> >> >> >> Cheers, >> >> Greg >> >> >> >> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <bgunter.4...@gmail.com> >> >> wrote: >> >> >> >>> Greg: >> >>> >> >>> I was not able to understand your task 1. Perhaps others can. >> >>> >> >>> My understanding of your task 2 is that for each row of ref, you wish >> >>> to find all rows,of map such that the reg values in those rows fall >> >>> between the reg1 and reg2 values in ref (inclusive change <= to < if >> >>> you don't want the endpoints), and then you want the minimum map$p >> >>> values of all those rows. If that is correct, I believe this will do >> >>> it (but caution, untested, as you failed to provide data in a >> >>> convenient form, e.g. using dput() ) >> >>> >> >>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i) >> >>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0)) >> >>> >> >>> >> >>> If my understanding is incorrect, please ignore both the above and the >> >>> following: >> >>> >> >>> >> >>> The "solution" I have given above seems inefficient, so others may be >> >>> able to significantly improve it if you find that it takes too long. >> >>> OTOH, my understanding of your specification is that you need to >> >>> search for all rows in map data frame that meet the criterion for each >> >>> row of ref, and without further information, I don't know how to do >> >>> this without just repeating the search 560 times. >> >>> >> >>> >> >>> Cheers, >> >>> Bert >> >>> >> >>> >> >>> Bert Gunter >> >>> >> >>> "The trouble with having an open mind is that people keep coming along >> >>> and sticking things into it." >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >>> >> >>> >> >>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hho...@gmail.com> >> wrote: >> >>> > Dear all; >> >>> > >> >>> > >> >>> > >> >>> > I have two data sets, data=map and data=ref). A small part of each >> data >> >>> set >> >>> > are given below. Data map has more than 27 million and data ref has >> >>> about >> >>> > 560 rows. Basically I need run two different task. My R codes for >> these >> >>> > task are given below but they do not work properly. >> >>> > >> >>> > I sincerely do appreciate your helps. >> >>> > >> >>> > >> >>> > Regards, >> >>> > >> >>> > Greg >> >>> > >> >>> > >> >>> > >> >>> > Task 1) >> >>> > >> >>> > For example, the first and second columns for row 1 in data ref are >> >>> 29220 >> >>> > 63933. So I need write an R code normally first look the first row >> in >> >>> ref >> >>> > (which they are 29220 and 63933) than summing the column of >> "map$rate" >> >>> and >> >>> > give the number of rows that >0.85. Then do the same for the second, >> >>> > third....in ref. At the end I would like a table gave below (the >> >>> results I >> >>> > need). Please notice the all value specified in ref data file are >> exist >> >>> in >> >>> > map$reg column. >> >>> > >> >>> > >> >>> > >> >>> > Task2) >> >>> > >> >>> > Again example, the first and second columns for row 1 in data ref >> are >> >>> 29220 >> >>> > 63933. So I need write an R code give the minimum map$p for the >> 29220 >> >>> > -63933 intervals in map file. Than >> >>> > >> >>> > do the same for the second, third....in ref. >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > #my attempt for the first question >> >>> > >> >>> > temp<-map[order(map$reg, map$p),] >> >>> > >> >>> > count<-1 >> >>> > >> >>> > temp<-unique(temp$reg >> >>> > >> >>> > for(i in 1:length(ref) { >> >>> > >> >>> > for(j in 1:length(ref) >> >>> > >> >>> > { >> >>> > >> >>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] & >> (temp[pos[j]==ref[ref$reg2,] >> >>> > & temp[cumsum(temp$rate) >> >>> >>0.70,]) >> >>> > >> >>> > count=count+1 >> >>> > >> >>> > } >> >>> > >> >>> > } >> >>> > >> >>> > #my attempt for the second question >> >>> > >> >>> > >> >>> > >> >>> > temp<-map[order(map$reg, map$p),] >> >>> > >> >>> > count<-1 >> >>> > >> >>> > temp<-unique(temp$reg >> >>> > >> >>> > for(i in 1:length(ref) { >> >>> > >> >>> > for(j in 1:length(ref) >> >>> > >> >>> > { >> >>> > >> >>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] & >> (temp[pos[j]==ref[ref$reg2,]) >> >>> > >> >>> > output<-temp2[temp2$p==min(temp2$p),] >> >>> > >> >>> > } >> >>> > >> >>> > } >> >>> > >> >>> > >> >>> > >> >>> > Data sets >> >>> > >> >>> > >> >>> > Data= map >> >>> > >> >>> > reg p rate >> >>> > >> >>> > 10276 0.700 3.867e-18 >> >>> > >> >>> > 71608 0.830 4.542e-16 >> >>> > >> >>> > 29220 0.430 1.948e-15 >> >>> > >> >>> > 99542 0.220 1.084e-15 >> >>> > >> >>> > 26441 0.880 9.675e-14 >> >>> > >> >>> > 95082 0.090 7.349e-13 >> >>> > >> >>> > 36169 0.480 9.715e-13 >> >>> > >> >>> > 55572 0.500 9.071e-12 >> >>> > >> >>> > 65255 0.300 1.688e-11 >> >>> > >> >>> > 51960 0.970 1.163e-10 >> >>> > >> >>> > 55652 0.388 3.750e-10 >> >>> > >> >>> > 63933 0.250 9.128e-10 >> >>> > >> >>> > 35170 0.720 7.355e-09 >> >>> > >> >>> > 06491 0.370 1.634e-08 >> >>> > >> >>> > 85508 0.470 1.057e-07 >> >>> > >> >>> > 86666 0.580 7.862e-07 >> >>> > >> >>> > 04758 0.810 9.501e-07 >> >>> > >> >>> > 06169 0.440 1.104e-06 >> >>> > >> >>> > 63933 0.750 2.624e-06 >> >>> > >> >>> > 41838 0.960 8.119e-06 >> >>> > >> >>> > >> >>> > data=ref >> >>> > >> >>> > reg1 reg2 >> >>> > >> >>> > 29220 63933 >> >>> > >> >>> > 26441 41838 >> >>> > >> >>> > 06169 10276 >> >>> > >> >>> > 74806 92643 >> >>> > >> >>> > 73732 82451 >> >>> > >> >>> > 86042 93502 >> >>> > >> >>> > 85508 95082 >> >>> > >> >>> > >> >>> > >> >>> > the results I need >> >>> > >> >>> > reg1 reg2 n >> >>> > >> >>> > 29220 63933 12 >> >>> > >> >>> > 26441 41838 78 >> >>> > >> >>> > 06169 10276 125 >> >>> > >> >>> > 74806 92643 11 >> >>> > >> >>> > 73732 82451 47 >> >>> > >> >>> > 86042 93502 98 >> >>> > >> >>> > 85508 95082 219 >> >>> > >> >>> > [[alternative HTML version deleted]] >> >>> > >> >>> > ______________________________________________ >> >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >>> > https://stat.ethz.ch/mailman/listinfo/r-help >> >>> > PLEASE do read the posting guide >> >>> http://www.R-project.org/posting-guide.html >> >>> > and provide commented, minimal, self-contained, reproducible code. >> >>> >> >> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.