Re: [R] left end or right end

2010-07-01 Thread Joris Meys
First of all, read the posting guide carefully :
http://www.R-project.org/posting-guide.html
Your question is far from clear. When you say that the lengths of P
and Q are different, you mean the length of the data or the difference
between start and end? That makes a world of difference.

Regarding the statistical test, that depends on what your data
represents. Is it possible for P to fall close to the left and the
right :
P-
Q   ---
For example.

You should also specify which test you want to use. Then people on the
list will be able to tell you whether that is available in R. You can
off course construct your own test with the tools R provides, but
again, this requires a lot more information. Next to that, the list is
actually not intended for statistical advice, but for advice regarding
R code. Maybe somebody will join in with some statistical guidance,
but if you don't know what to do, you better consult a statistician at
your departement.

Cheers
Joris

On Thu, Jul 1, 2010 at 1:53 PM, ravikumar sukumar
ravikumarsuku...@gmail.com wrote:
 Dear all,
 I am a biologist. I have two sets of distance P(start1, end1) and Q(start2,
 end2).
 The distance will be like this.
 P         
 Q  

 I want to know whether P falls closely to the right end or left  end of Q.
  P and Q are of different lengths for each data point. There are more than
 1 pairs of P and Q.
 Is there any test or function in R to bring a statistically significant
 conclusion.

 Thanks for all,
 Suku

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread Matt Shotwell
Suku, 

It looks like you might want to consult with a [bio]statistician, but
I'm interested in what these distances represent. Can you give some
additional context for your problem? How were these distances collected?
Is it a collection of pairs of intervals, like this:

   P   Q
1)  (1.5, 1.8)  (1.2, 2.0)
2)  (1.4, 1.9)  (1.4, 2.3)
...
1)  (start1, end1)  (start2, end2)

?

If so, is there a more specific test you're interested in? For instance,
whether the interval P overlaps with the start/stop position of interval
Q, or whether start1 == start2, or end1 == end2, or both? I can think of
a bootstrap test for hypotheses like this, and this is relatively easy
in R.

-Matt

On Thu, 2010-07-01 at 07:53 -0400, ravikumar sukumar wrote:
 Dear all,
 I am a biologist. I have two sets of distance P(start1, end1) and Q(start2,
 end2).
 The distance will be like this.
 P 
 Q  
 
 I want to know whether P falls closely to the right end or left  end of Q.
  P and Q are of different lengths for each data point. There are more than
 1 pairs of P and Q.
 Is there any test or function in R to bring a statistically significant
 conclusion.
 
 Thanks for all,
 Suku
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina
http://biostatmatt.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread David Winsemius


On Jul 1, 2010, at 7:53 AM, ravikumar sukumar wrote:


Dear all,
I am a biologist. I have two sets of distance P(start1, end1) and  
Q(start2,

end2).
The distance will be like this.
P 
Q  

I want to know whether P falls closely to the right end or left  end  
of Q.

P and Q are of different lengths for each data point.


Do you want to know whether P(start1) - Q(Start2)  P(end1) - Q(end2)

The arithmetic operators and comparison operators are vectorized.


There are more than
1 pairs of P and Q.


You could offer an example:

?head

Is there any test or function in R to bring a statistically  
significant

conclusion.


?binom.test  # if my interpretation above is what you were asking.



Thanks for all,
Suku

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread David Winsemius


On Jul 1, 2010, at 9:00 AM, David Winsemius wrote:



On Jul 1, 2010, at 7:53 AM, ravikumar sukumar wrote:


Dear all,
I am a biologist. I have two sets of distance P(start1, end1) and  
Q(start2,

end2).
The distance will be like this.
P 
Q  

I want to know whether P falls closely to the right end or left   
end of Q.

P and Q are of different lengths for each data point.


Do you want to know whether


Should have been : abs( P(start1) - Q(Start2) )  abs( P(end1) -  
Q(end2) )




The arithmetic operators and comparison operators are vectorized.


There are more than
1 pairs of P and Q.


You could offer an example:

?head

Is there any test or function in R to bring a statistically  
significant

conclusion.


?binom.test  # if my interpretation above is what you were asking.



Thanks for all,






David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread ravikumar sukumar
Sorry for posting to the R list.

P  Q
12, 28   10, 42
2, 5   1, 55
32, 50   22, 63
. there are 1 points of P and Q.
The number of points of P and Q are equal (i,e 1).

The interval P always overlaps with Q. i,e start1start2 and end1end2.

mere calculating whether points have this condition will not be
significant start1start2 and end1end2 and the length of P that is
length(end1-start1) and Q ie length(end2-start1) differs.

Example
Case A:
start2-start1 =2
end2-end1 = 3

Case B:
start2 - start1 =100
end2-end1 = 2

In the above two cases, P is falling on the right end of Q in case B. But it
depends on the length(end2-start2). If the length(end2-start2) =15000 in
case of B, then it is almost on the middle point.

Is there any test or function in R to bring a statistically
significant conclusion that midpoint of P or P itself is falling on the left
end or right end of Q.

sorry once again for posting in this list.

Regards

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread Jonathan Christensen
Hi,

You need to define what you want more exactly--what are the possible
conclusions (hypotheses) you want to reach? Based on what you've said, I can
think of several different approaches you might want, but I'm not sure which
one of them you're actually after. For example:

Hypothesis A: The distance between the left endpoints of P and Q is less
than (or equal to) the distance between the right endpoints.
Hypothesis B: The distance between the right endpoints is smaller.

This is a simple binomial test, as David Winsemius suggested. In your most
recent email, though, it sounds like you want to take into account how much
smaller one distance is than the other. This is more complicated.

Another option occurred to me: maybe you don't care which end P is close to,
you just want to know whether it's close to one of the ends, or somewhere in
the middle.

Without knowing what exactly you are trying to test, it's very hard for us
to help you.

Jonathan


On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar 
ravikumarsuku...@gmail.com wrote:

 Sorry for posting to the R list.

 P  Q
 12, 28   10, 42
 2, 5   1, 55
 32, 50   22, 63
 . there are 1 points of P and Q.
 The number of points of P and Q are equal (i,e 1).

 The interval P always overlaps with Q. i,e start1start2 and end1end2.

 mere calculating whether points have this condition will not be
 significant start1start2 and end1end2 and the length of P that is
 length(end1-start1) and Q ie length(end2-start1) differs.

 Example
 Case A:
 start2-start1 =2
 end2-end1 = 3

 Case B:
 start2 - start1 =100
 end2-end1 = 2

 In the above two cases, P is falling on the right end of Q in case B. But
 it
 depends on the length(end2-start2). If the length(end2-start2) =15000 in
 case of B, then it is almost on the middle point.

 Is there any test or function in R to bring a statistically
 significant conclusion that midpoint of P or P itself is falling on the
 left
 end or right end of Q.

 sorry once again for posting in this list.

 Regards

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread ravikumar sukumar
There are three possibilities:

Case1: Left end

P--
Q--

Case2: Right end

P--
Q--


Case3: At mid position

P-
A--


My question is how far my data falls on the all the three cases. Is it
biased towards case1 or case2 or case3. I have to consider the length of Q
in the data. Example: start2-start1 =2  and end2-end1 = 3 does not make much
difference if length of Q is 15.

I do not hypothesize, i want to know how my data goes on.

Thanks and regards







On Thu, Jul 1, 2010 at 4:05 PM, Jonathan Christensen dzhona...@gmail.comwrote:

 Hi,

 You need to define what you want more exactly--what are the possible
 conclusions (hypotheses) you want to reach? Based on what you've said, I can
 think of several different approaches you might want, but I'm not sure which
 one of them you're actually after. For example:

 Hypothesis A: The distance between the left endpoints of P and Q is less
 than (or equal to) the distance between the right endpoints.
 Hypothesis B: The distance between the right endpoints is smaller.

 This is a simple binomial test, as David Winsemius suggested. In your most
 recent email, though, it sounds like you want to take into account how much
 smaller one distance is than the other. This is more complicated.

 Another option occurred to me: maybe you don't care which end P is close
 to, you just want to know whether it's close to one of the ends, or
 somewhere in the middle.

 Without knowing what exactly you are trying to test, it's very hard for us
 to help you.

 Jonathan


 On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar 
 ravikumarsuku...@gmail.com wrote:

 Sorry for posting to the R list.

 P  Q
 12, 28   10, 42
 2, 5   1, 55
 32, 50   22, 63
 . there are 1 points of P and Q.
 The number of points of P and Q are equal (i,e 1).

 The interval P always overlaps with Q. i,e start1start2 and end1end2.

 mere calculating whether points have this condition will not be
 significant start1start2 and end1end2 and the length of P that is
 length(end1-start1) and Q ie length(end2-start1) differs.

 Example
 Case A:


 Case B:
 start2 - start1 =100
 end2-end1 = 2

 In the above two cases, P is falling on the right end of Q in case B. But
 it
 depends on the length(end2-start2). If the length(end2-start2) =15000 in
 case of B, then it is almost on the middle point.

 Is there any test or function in R to bring a statistically
 significant conclusion that midpoint of P or P itself is falling on the
 left
 end or right end of Q.

 sorry once again for posting in this list.

 Regards

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread Steve Lianoglou
Hi,

On Thu, Jul 1, 2010 at 10:24 AM, ravikumar sukumar
ravikumarsuku...@gmail.com wrote:
 There are three possibilities:

 Case1: Left end

 P--
 Q--

 Case2: Right end

 P                        --
 Q--


 Case3: At mid position

 P        -
 A--


 My question is how far my data falls on the all the three cases. Is it
 biased towards case1 or case2 or case3. I have to consider the length of Q
 in the data. Example: start2-start1 =2  and end2-end1 = 3 does not make much
 difference if length of Q is 15.

 I do not hypothesize, i want to know how my data goes on.

Please note that the suggestions I give below don't give you a means
of doing statistical testing of any sort, I'm just giving you ideas to
help you figure out what's going on in your data.

So:

Why not just do some simple manipulations[*] and then plot the
distribution of where all of your P's land in their respective Q's

[*] Simple Manipulations

Maybe you can ask:
How far in (in terms of the percent-of-Q's length) does P start

I think you previously said that you know that P is always contained
in its paired Q, so I'm going to assume this is true for simplicity:

Let's assume that you have two matrices P and Q. The rows are the
paired p and q elements, the columns are their start,end positions.

R P.width - P[,2] - P[,1] + 1
R Q.width - Q[,2] - Q[,1] + 1

How far INTO Q does its paired P value start?

## P[,1] is always = 1 Q[,1]
R P.start - P[,1] - Q[,1]

Now let's adjust Q's width, so we can ask something like How far
(%-wise) into Q does P land?)

R Q.width.adjust - Q.width - P.width

And get the percent into Q that P starts in

R how.far - P.start / Q.width

This is untested code. I'm not promising that it works, but I'm just
helping convey my idea into words. You'll likely have to debug as
appropriate.

What I'm imagining should give you (for your examples):

Case1 : 0%
Case2 : 100%
Case3 : 30% (?)

Then you can plot the density of how.far to see what's happening.



Another thing you can do is to use your P to split your Q into two
segments, then plot the ratio of the length of the left segment vs.
the length of the right.

In order for this to work, I'm guessing you have to pad Q with 1
basepair (or whatever) on each side, ie:

Case1:
Originally:
  P--
  Q--

Xform case by padding +1 on either side of Q:
  P --
  Q

Split Q with P

  Q1: -
  Q2: --

Now take ratio:
width(Q1) / width(Q2)

Case 2:
Mirror Case 1

Case 3:
Originally:
  P-
  Q--

Xform by padding Q
  P-
  Q

Split Q with P:
  Q1: 
  Q2: ---

Take ratio:
  width(Q1) / width(Q2)

Plot the distribution of these ratios to see what's up. (Note that the
width function is something you have to define)

If you're dealing with this type of data and taking these types of
approaches, I'd suggest looking into the IRanges packages from
bioconductor, which will make working with these quite simple (after
you read through its extensive documentation, of course -- this
package *does* provide a width function, though ;-)


HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread David Winsemius


On Jul 1, 2010, at 10:24 AM, ravikumar sukumar wrote:


There are three possibilities:

Case1: Left end

P--
Q--

Case2: Right end

P--
Q--


Case3: At mid position

P-
A--


My question is how far my data falls on the all the three cases. Is it
biased towards case1 or case2 or case3. I have to consider the  
length of Q
in the data. Example: start2-start1 =2  and end2-end1 = 3 does not  
make much

difference if length of Q is 15.

I do not hypothesize,


You may not hypothesize, but neither do you pose a clear question. At  
what point do the lengths go from being case 1 to case 3?



P --
Q--



  P--
Q--



P   --
Q--




 P--
Q--


Your answer should be expressed in mathematical terms and you should  
present test cases constructed in R.


--
David



i want to know how my data goes on.

Thanks and regards







On Thu, Jul 1, 2010 at 4:05 PM, Jonathan Christensen dzhona...@gmail.com 
wrote:



Hi,

You need to define what you want more exactly--what are the possible
conclusions (hypotheses) you want to reach? Based on what you've  
said, I can
think of several different approaches you might want, but I'm not  
sure which

one of them you're actually after. For example:

Hypothesis A: The distance between the left endpoints of P and Q is  
less

than (or equal to) the distance between the right endpoints.
Hypothesis B: The distance between the right endpoints is smaller.

This is a simple binomial test, as David Winsemius suggested. In  
your most
recent email, though, it sounds like you want to take into account  
how much

smaller one distance is than the other. This is more complicated.

Another option occurred to me: maybe you don't care which end P is  
close

to, you just want to know whether it's close to one of the ends, or
somewhere in the middle.

Without knowing what exactly you are trying to test, it's very hard  
for us

to help you.

Jonathan


On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar 
ravikumarsuku...@gmail.com wrote:


Sorry for posting to the R list.

P  Q
12, 28   10, 42
2, 5   1, 55
32, 50   22, 63
. there are 1 points of P and Q.
The number of points of P and Q are equal (i,e 1).

The interval P always overlaps with Q. i,e start1start2 and  
end1end2.


mere calculating whether points have this condition will not be
significant start1start2 and end1end2 and the length of P that is
length(end1-start1) and Q ie length(end2-start1) differs.

Example
Case A:


Case B:
start2 - start1 =100
end2-end1 = 2

In the above two cases, P is falling on the right end of Q in case  
B. But

it
depends on the length(end2-start2). If the length(end2-start2)  
=15000 in

case of B, then it is almost on the middle point.

Is there any test or function in R to bring a statistically
significant conclusion that midpoint of P or P itself is falling  
on the

left
end or right end of Q.

sorry once again for posting in this list.

Regards

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread Matt Shotwell
Suku, 

Just to clarify, in your table and each of your images, it appears that
the start position of P (start1) is _after_ or at the start position of
Q (start2), and the end position of P (end1) is _before_ or at the end
position of Q (end2). If these positions represent increasing integers,
then start1 = start2 and end1 = end2. I will assume this for the
discussion below.
  
You mentioned wanting to know whether the midpoint of P tended to be
greater or lesser than the midpoint of Q. That seems like a good idea,
since the midpoints _must_ be similar when the lengths of P and Q are
similar. Hence, if P and Q are samples from a population, then you may
be interested in the population mean difference in midpoints. We can
denote this mean M:

M = E(mid(P) - mid(Q))

In order to do a classical statistical test, we _need_ a hypothesis
about M, and a rule for rejecting the hypothesis. That's why we use the
term 'hypothesis'. An appropriate hypothesis here might be:

H0: M = 0

or, in words, the mean difference in the P and Q midpoints is zero. A
simple rejection rule for this hypothesis is:

reject H0 when the observed mean difference in P and Q midpoints is
greater than some quantity C, or less than -C.

The trick then is to find C that satisfies some type 1 error
probability, usually 0.05. It's here that I might recommend a bootstrap
procedure.

If, in the end, you reject the hypothesis H0, you can use the sign of
the estimated mean difference in your biological inferences. ...And I'm
still interested to hear what those are. :-) Of course, these are just
my ideas, you really ought to visit a biostatistician for professional
advice.

-Matt



On Thu, 2010-07-01 at 10:24 -0400, ravikumar sukumar wrote:
 There are three possibilities:
 
 Case1: Left end
 
 P--
 Q--
 
 Case2: Right end
 
 P--
 Q--
 
 
 Case3: At mid position
 
 P-
 A--
 
 
 My question is how far my data falls on the all the three cases. Is it
 biased towards case1 or case2 or case3. I have to consider the length of Q
 in the data. Example: start2-start1 =2  and end2-end1 = 3 does not make much
 difference if length of Q is 15.
 
 I do not hypothesize, i want to know how my data goes on.
 
 Thanks and regards
 
 
 
 
 
 
 
 On Thu, Jul 1, 2010 at 4:05 PM, Jonathan Christensen 
 dzhona...@gmail.comwrote:
 
  Hi,
 
  You need to define what you want more exactly--what are the possible
  conclusions (hypotheses) you want to reach? Based on what you've said, I can
  think of several different approaches you might want, but I'm not sure which
  one of them you're actually after. For example:
 
  Hypothesis A: The distance between the left endpoints of P and Q is less
  than (or equal to) the distance between the right endpoints.
  Hypothesis B: The distance between the right endpoints is smaller.
 
  This is a simple binomial test, as David Winsemius suggested. In your most
  recent email, though, it sounds like you want to take into account how much
  smaller one distance is than the other. This is more complicated.
 
  Another option occurred to me: maybe you don't care which end P is close
  to, you just want to know whether it's close to one of the ends, or
  somewhere in the middle.
 
  Without knowing what exactly you are trying to test, it's very hard for us
  to help you.
 
  Jonathan
 
 
  On Thu, Jul 1, 2010 at 7:45 AM, ravikumar sukumar 
  ravikumarsuku...@gmail.com wrote:
 
  Sorry for posting to the R list.
 
  P  Q
  12, 28   10, 42
  2, 5   1, 55
  32, 50   22, 63
  . there are 1 points of P and Q.
  The number of points of P and Q are equal (i,e 1).
 
  The interval P always overlaps with Q. i,e start1start2 and end1end2.
 
  mere calculating whether points have this condition will not be
  significant start1start2 and end1end2 and the length of P that is
  length(end1-start1) and Q ie length(end2-start1) differs.
 
  Example
  Case A:
 
 
  Case B:
  start2 - start1 =100
  end2-end1 = 2
 
  In the above two cases, P is falling on the right end of Q in case B. But
  it
  depends on the length(end2-start2). If the length(end2-start2) =15000 in
  case of B, then it is almost on the middle point.
 
  Is there any test or function in R to bring a statistically
  significant conclusion that midpoint of P or P itself is falling on the
  left
  end or right end of Q.
 
  sorry once again for posting in this list.
 
  Regards
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
   [[alternative HTML version deleted]]