Hi, On Thu, Jul 1, 2010 at 10:24 AM, ravikumar sukumar <ravikumarsuku...@gmail.com> wrote: > There are three possibilities: > > Case1: Left end > > P-------------- > Q-------------------------------------- > > Case2: Right end > > P -------------- > Q-------------------------------------- > > > Case3: At mid position > > P ------------- > A-------------------------------------- > > > My question is how far my data falls on the all the three cases. Is it > biased towards case1 or case2 or case3. I have to consider the length of Q > in the data. Example: start2-start1 =2 and end2-end1 = 3 does not make much > difference if length of Q is 150000. > > I do not hypothesize, i want to know how my data goes on.
Please note that the suggestions I give below don't give you a means of doing statistical testing of any sort, I'm just giving you ideas to help you figure out what's going on in your data. So: Why not just do some simple manipulations[*] and then plot the distribution of where all of your P's land in their respective Q's [*] Simple Manipulations Maybe you can ask: How far "in" (in terms of the percent-of-Q's length) does P start I think you previously said that you know that P is always contained in its paired Q, so I'm going to assume this is true for simplicity: Let's assume that you have two matrices P and Q. The rows are the "paired" p and q elements, the columns are their start,end positions. R> P.width <- P[,2] - P[,1] + 1 R> Q.width <- Q[,2] - Q[,1] + 1 How far INTO Q does its paired P value start? ## P[,1] is always >= 1 Q[,1] R> P.start <- P[,1] - Q[,1] Now let's adjust Q's width, so we can ask something like "How far (%-wise) into Q does P land?) R> Q.width.adjust <- Q.width - P.width And get the "percent into Q that P starts in" R> how.far <- P.start / Q.width This is untested code. I'm not promising that it works, but I'm just helping convey my idea into words. You'll likely have to debug as appropriate. What I'm imagining should give you (for your examples): Case1 : 0% Case2 : 100% Case3 : 30% (?) Then you can plot the density of how.far to see what's happening. ++++++++++++++++++++++++++++++++++++++++++++++++++++ Another thing you can do is to use your P to split your Q into two segments, then plot the ratio of the length of the left segment vs. the length of the right. In order for this to work, I'm guessing you have to pad Q with 1 basepair (or whatever) on each side, ie: Case1: Originally: P-------------- Q-------------------------------------- Xform case by padding +1 on either side of Q: P -------------- Q---------------------------------------- Split Q with P Q1: - Q2: -------------------------- Now take ratio: width(Q1) / width(Q2) Case 2: Mirror Case 1 Case 3: Originally: P ------------- Q-------------------------------------- Xform by padding Q P ------------- Q---------------------------------------- Split Q with P: Q1: -------- Q2: ------------------- Take ratio: width(Q1) / width(Q2) Plot the distribution of these ratios to see what's up. (Note that the "width" function is something you have to define) If you're dealing with this type of data and taking these types of approaches, I'd suggest looking into the IRanges packages from bioconductor, which will make working with these quite simple (after you read through its extensive documentation, of course -- this package *does* provide a "width" function, though ;-) HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.