Re: [R] left end or right end
First of all, read the posting guide carefully : http://www.R-project.org/posting-guide.html Your question is far from clear. When you say that the lengths of P and Q are different, you mean the length of the data or the difference between start and end? That makes a world of difference. Regarding the statistical test, that depends on what your data represents. Is it possible for P to fall close to the left and the right : P- Q --- For example. You should also specify which test you want to use. Then people on the list will be able to tell you whether that is available in R. You can off course construct your own test with the tools R provides, but again, this requires a lot more information. Next to that, the list is actually not intended for statistical advice, but for advice regarding R code. Maybe somebody will join in with some statistical guidance, but if you don't know what to do, you better consult a statistician at your departement. Cheers Joris On Thu, Jul 1, 2010 at 1:53 PM, ravikumar sukumar ravikumarsuku...@gmail.com wrote: Dear all, I am a biologist. I have two sets of distance P(start1, end1) and Q(start2, end2). The distance will be like this. P Q I want to know whether P falls closely to the right end or left end of Q. P and Q are of different lengths for each data point. There are more than 1 pairs of P and Q. Is there any test or function in R to bring a statistically significant conclusion. Thanks for all, Suku [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
Suku, It looks like you might want to consult with a [bio]statistician, but I'm interested in what these distances represent. Can you give some additional context for your problem? How were these distances collected? Is it a collection of pairs of intervals, like this: P Q 1) (1.5, 1.8) (1.2, 2.0) 2) (1.4, 1.9) (1.4, 2.3) ... 1) (start1, end1) (start2, end2) ? If so, is there a more specific test you're interested in? For instance, whether the interval P overlaps with the start/stop position of interval Q, or whether start1 == start2, or end1 == end2, or both? I can think of a bootstrap test for hypotheses like this, and this is relatively easy in R. -Matt On Thu, 2010-07-01 at 07:53 -0400, ravikumar sukumar wrote: Dear all, I am a biologist. I have two sets of distance P(start1, end1) and Q(start2, end2). The distance will be like this. P Q I want to know whether P falls closely to the right end or left end of Q. P and Q are of different lengths for each data point. There are more than 1 pairs of P and Q. Is there any test or function in R to bring a statistically significant conclusion. Thanks for all, Suku [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina http://biostatmatt.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
On Jul 1, 2010, at 7:53 AM, ravikumar sukumar wrote: Dear all, I am a biologist. I have two sets of distance P(start1, end1) and Q(start2, end2). The distance will be like this. P Q I want to know whether P falls closely to the right end or left end of Q. P and Q are of different lengths for each data point. Do you want to know whether P(start1) - Q(Start2) P(end1) - Q(end2) The arithmetic operators and comparison operators are vectorized. There are more than 1 pairs of P and Q. You could offer an example: ?head Is there any test or function in R to bring a statistically significant conclusion. ?binom.test # if my interpretation above is what you were asking. Thanks for all, Suku [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
On Jul 1, 2010, at 9:00 AM, David Winsemius wrote: On Jul 1, 2010, at 7:53 AM, ravikumar sukumar wrote: Dear all, I am a biologist. I have two sets of distance P(start1, end1) and Q(start2, end2). The distance will be like this. P Q I want to know whether P falls closely to the right end or left end of Q. P and Q are of different lengths for each data point. Do you want to know whether Should have been : abs( P(start1) - Q(Start2) ) abs( P(end1) - Q(end2) ) The arithmetic operators and comparison operators are vectorized. There are more than 1 pairs of P and Q. You could offer an example: ?head Is there any test or function in R to bring a statistically significant conclusion. ?binom.test # if my interpretation above is what you were asking. Thanks for all, David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
Sorry for posting to the R list. P Q 12, 28 10, 42 2, 5 1, 55 32, 50 22, 63 . there are 1 points of P and Q. The number of points of P and Q are equal (i,e 1). The interval P always overlaps with Q. i,e start1start2 and end1end2. mere calculating whether points have this condition will not be significant start1start2 and end1end2 and the length of P that is length(end1-start1) and Q ie length(end2-start1) differs. Example Case A: start2-start1 =2 end2-end1 = 3 Case B: start2 - start1 =100 end2-end1 = 2 In the above two cases, P is falling on the right end of Q in case B. But it depends on the length(end2-start2). If the length(end2-start2) =15000 in case of B, then it is almost on the middle point. Is there any test or function in R to bring a statistically significant conclusion that midpoint of P or P itself is falling on the left end or right end of Q. sorry once again for posting in this list. Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
Hi, You need to define what you want more exactly--what are the possible conclusions (hypotheses) you want to reach? Based on what you've said, I can think of several different approaches you might want, but I'm not sure which one of them you're actually after. For example: Hypothesis A: The distance between the left endpoints of P and Q is less than (or equal to) the distance between the right endpoints. Hypothesis B: The distance between the right endpoints is smaller. This is a simple binomial test, as David Winsemius suggested. In your most recent email, though, it sounds like you want to take into account how much smaller one distance is than the other. This is more complicated. Another option occurred to me: maybe you don't care which end P is close to, you just want to know whether it's close to one of the ends, or somewhere in the middle. Without knowing what exactly you are trying to test, it's very hard for us to help you. Jonathan On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar ravikumarsuku...@gmail.com wrote: Sorry for posting to the R list. P Q 12, 28 10, 42 2, 5 1, 55 32, 50 22, 63 . there are 1 points of P and Q. The number of points of P and Q are equal (i,e 1). The interval P always overlaps with Q. i,e start1start2 and end1end2. mere calculating whether points have this condition will not be significant start1start2 and end1end2 and the length of P that is length(end1-start1) and Q ie length(end2-start1) differs. Example Case A: start2-start1 =2 end2-end1 = 3 Case B: start2 - start1 =100 end2-end1 = 2 In the above two cases, P is falling on the right end of Q in case B. But it depends on the length(end2-start2). If the length(end2-start2) =15000 in case of B, then it is almost on the middle point. Is there any test or function in R to bring a statistically significant conclusion that midpoint of P or P itself is falling on the left end or right end of Q. sorry once again for posting in this list. Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
There are three possibilities: Case1: Left end P-- Q-- Case2: Right end P-- Q-- Case3: At mid position P- A-- My question is how far my data falls on the all the three cases. Is it biased towards case1 or case2 or case3. I have to consider the length of Q in the data. Example: start2-start1 =2 and end2-end1 = 3 does not make much difference if length of Q is 15. I do not hypothesize, i want to know how my data goes on. Thanks and regards On Thu, Jul 1, 2010 at 4:05 PM, Jonathan Christensen dzhona...@gmail.comwrote: Hi, You need to define what you want more exactly--what are the possible conclusions (hypotheses) you want to reach? Based on what you've said, I can think of several different approaches you might want, but I'm not sure which one of them you're actually after. For example: Hypothesis A: The distance between the left endpoints of P and Q is less than (or equal to) the distance between the right endpoints. Hypothesis B: The distance between the right endpoints is smaller. This is a simple binomial test, as David Winsemius suggested. In your most recent email, though, it sounds like you want to take into account how much smaller one distance is than the other. This is more complicated. Another option occurred to me: maybe you don't care which end P is close to, you just want to know whether it's close to one of the ends, or somewhere in the middle. Without knowing what exactly you are trying to test, it's very hard for us to help you. Jonathan On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar ravikumarsuku...@gmail.com wrote: Sorry for posting to the R list. P Q 12, 28 10, 42 2, 5 1, 55 32, 50 22, 63 . there are 1 points of P and Q. The number of points of P and Q are equal (i,e 1). The interval P always overlaps with Q. i,e start1start2 and end1end2. mere calculating whether points have this condition will not be significant start1start2 and end1end2 and the length of P that is length(end1-start1) and Q ie length(end2-start1) differs. Example Case A: Case B: start2 - start1 =100 end2-end1 = 2 In the above two cases, P is falling on the right end of Q in case B. But it depends on the length(end2-start2). If the length(end2-start2) =15000 in case of B, then it is almost on the middle point. Is there any test or function in R to bring a statistically significant conclusion that midpoint of P or P itself is falling on the left end or right end of Q. sorry once again for posting in this list. Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
Hi, On Thu, Jul 1, 2010 at 10:24 AM, ravikumar sukumar ravikumarsuku...@gmail.com wrote: There are three possibilities: Case1: Left end P-- Q-- Case2: Right end P -- Q-- Case3: At mid position P - A-- My question is how far my data falls on the all the three cases. Is it biased towards case1 or case2 or case3. I have to consider the length of Q in the data. Example: start2-start1 =2 and end2-end1 = 3 does not make much difference if length of Q is 15. I do not hypothesize, i want to know how my data goes on. Please note that the suggestions I give below don't give you a means of doing statistical testing of any sort, I'm just giving you ideas to help you figure out what's going on in your data. So: Why not just do some simple manipulations[*] and then plot the distribution of where all of your P's land in their respective Q's [*] Simple Manipulations Maybe you can ask: How far in (in terms of the percent-of-Q's length) does P start I think you previously said that you know that P is always contained in its paired Q, so I'm going to assume this is true for simplicity: Let's assume that you have two matrices P and Q. The rows are the paired p and q elements, the columns are their start,end positions. R P.width - P[,2] - P[,1] + 1 R Q.width - Q[,2] - Q[,1] + 1 How far INTO Q does its paired P value start? ## P[,1] is always = 1 Q[,1] R P.start - P[,1] - Q[,1] Now let's adjust Q's width, so we can ask something like How far (%-wise) into Q does P land?) R Q.width.adjust - Q.width - P.width And get the percent into Q that P starts in R how.far - P.start / Q.width This is untested code. I'm not promising that it works, but I'm just helping convey my idea into words. You'll likely have to debug as appropriate. What I'm imagining should give you (for your examples): Case1 : 0% Case2 : 100% Case3 : 30% (?) Then you can plot the density of how.far to see what's happening. Another thing you can do is to use your P to split your Q into two segments, then plot the ratio of the length of the left segment vs. the length of the right. In order for this to work, I'm guessing you have to pad Q with 1 basepair (or whatever) on each side, ie: Case1: Originally: P-- Q-- Xform case by padding +1 on either side of Q: P -- Q Split Q with P Q1: - Q2: -- Now take ratio: width(Q1) / width(Q2) Case 2: Mirror Case 1 Case 3: Originally: P- Q-- Xform by padding Q P- Q Split Q with P: Q1: Q2: --- Take ratio: width(Q1) / width(Q2) Plot the distribution of these ratios to see what's up. (Note that the width function is something you have to define) If you're dealing with this type of data and taking these types of approaches, I'd suggest looking into the IRanges packages from bioconductor, which will make working with these quite simple (after you read through its extensive documentation, of course -- this package *does* provide a width function, though ;-) HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
On Jul 1, 2010, at 10:24 AM, ravikumar sukumar wrote: There are three possibilities: Case1: Left end P-- Q-- Case2: Right end P-- Q-- Case3: At mid position P- A-- My question is how far my data falls on the all the three cases. Is it biased towards case1 or case2 or case3. I have to consider the length of Q in the data. Example: start2-start1 =2 and end2-end1 = 3 does not make much difference if length of Q is 15. I do not hypothesize, You may not hypothesize, but neither do you pose a clear question. At what point do the lengths go from being case 1 to case 3? P -- Q-- P-- Q-- P -- Q-- P-- Q-- Your answer should be expressed in mathematical terms and you should present test cases constructed in R. -- David i want to know how my data goes on. Thanks and regards On Thu, Jul 1, 2010 at 4:05 PM, Jonathan Christensen dzhona...@gmail.com wrote: Hi, You need to define what you want more exactly--what are the possible conclusions (hypotheses) you want to reach? Based on what you've said, I can think of several different approaches you might want, but I'm not sure which one of them you're actually after. For example: Hypothesis A: The distance between the left endpoints of P and Q is less than (or equal to) the distance between the right endpoints. Hypothesis B: The distance between the right endpoints is smaller. This is a simple binomial test, as David Winsemius suggested. In your most recent email, though, it sounds like you want to take into account how much smaller one distance is than the other. This is more complicated. Another option occurred to me: maybe you don't care which end P is close to, you just want to know whether it's close to one of the ends, or somewhere in the middle. Without knowing what exactly you are trying to test, it's very hard for us to help you. Jonathan On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar ravikumarsuku...@gmail.com wrote: Sorry for posting to the R list. P Q 12, 28 10, 42 2, 5 1, 55 32, 50 22, 63 . there are 1 points of P and Q. The number of points of P and Q are equal (i,e 1). The interval P always overlaps with Q. i,e start1start2 and end1end2. mere calculating whether points have this condition will not be significant start1start2 and end1end2 and the length of P that is length(end1-start1) and Q ie length(end2-start1) differs. Example Case A: Case B: start2 - start1 =100 end2-end1 = 2 In the above two cases, P is falling on the right end of Q in case B. But it depends on the length(end2-start2). If the length(end2-start2) =15000 in case of B, then it is almost on the middle point. Is there any test or function in R to bring a statistically significant conclusion that midpoint of P or P itself is falling on the left end or right end of Q. sorry once again for posting in this list. Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
Suku, Just to clarify, in your table and each of your images, it appears that the start position of P (start1) is _after_ or at the start position of Q (start2), and the end position of P (end1) is _before_ or at the end position of Q (end2). If these positions represent increasing integers, then start1 = start2 and end1 = end2. I will assume this for the discussion below. You mentioned wanting to know whether the midpoint of P tended to be greater or lesser than the midpoint of Q. That seems like a good idea, since the midpoints _must_ be similar when the lengths of P and Q are similar. Hence, if P and Q are samples from a population, then you may be interested in the population mean difference in midpoints. We can denote this mean M: M = E(mid(P) - mid(Q)) In order to do a classical statistical test, we _need_ a hypothesis about M, and a rule for rejecting the hypothesis. That's why we use the term 'hypothesis'. An appropriate hypothesis here might be: H0: M = 0 or, in words, the mean difference in the P and Q midpoints is zero. A simple rejection rule for this hypothesis is: reject H0 when the observed mean difference in P and Q midpoints is greater than some quantity C, or less than -C. The trick then is to find C that satisfies some type 1 error probability, usually 0.05. It's here that I might recommend a bootstrap procedure. If, in the end, you reject the hypothesis H0, you can use the sign of the estimated mean difference in your biological inferences. ...And I'm still interested to hear what those are. :-) Of course, these are just my ideas, you really ought to visit a biostatistician for professional advice. -Matt On Thu, 2010-07-01 at 10:24 -0400, ravikumar sukumar wrote: There are three possibilities: Case1: Left end P-- Q-- Case2: Right end P-- Q-- Case3: At mid position P- A-- My question is how far my data falls on the all the three cases. Is it biased towards case1 or case2 or case3. I have to consider the length of Q in the data. Example: start2-start1 =2 and end2-end1 = 3 does not make much difference if length of Q is 15. I do not hypothesize, i want to know how my data goes on. Thanks and regards On Thu, Jul 1, 2010 at 4:05 PM, Jonathan Christensen dzhona...@gmail.comwrote: Hi, You need to define what you want more exactly--what are the possible conclusions (hypotheses) you want to reach? Based on what you've said, I can think of several different approaches you might want, but I'm not sure which one of them you're actually after. For example: Hypothesis A: The distance between the left endpoints of P and Q is less than (or equal to) the distance between the right endpoints. Hypothesis B: The distance between the right endpoints is smaller. This is a simple binomial test, as David Winsemius suggested. In your most recent email, though, it sounds like you want to take into account how much smaller one distance is than the other. This is more complicated. Another option occurred to me: maybe you don't care which end P is close to, you just want to know whether it's close to one of the ends, or somewhere in the middle. Without knowing what exactly you are trying to test, it's very hard for us to help you. Jonathan On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar ravikumarsuku...@gmail.com wrote: Sorry for posting to the R list. P Q 12, 28 10, 42 2, 5 1, 55 32, 50 22, 63 . there are 1 points of P and Q. The number of points of P and Q are equal (i,e 1). The interval P always overlaps with Q. i,e start1start2 and end1end2. mere calculating whether points have this condition will not be significant start1start2 and end1end2 and the length of P that is length(end1-start1) and Q ie length(end2-start1) differs. Example Case A: Case B: start2 - start1 =100 end2-end1 = 2 In the above two cases, P is falling on the right end of Q in case B. But it depends on the length(end2-start2). If the length(end2-start2) =15000 in case of B, then it is almost on the middle point. Is there any test or function in R to bring a statistically significant conclusion that midpoint of P or P itself is falling on the left end or right end of Q. sorry once again for posting in this list. Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]