Hi, 
I have two big data set. 

data _1 : 
> dim(data_1)
[1] 15820 5

> head(data_1)
   Chromosome      Start        End        Feature GroupA_3
1:               chr1 521369  750000     chr1-0001        0.170
2:               chr1 750001  800000     chr1-0002       -0.086
3:               chr1 800001  850000     chr1-0003        0.006
4:               chr1 850001  900000     chr1-0004        0.050
5:               chr1 900001  950000     chr1-0005        0.062
6:               chr1 950001 1000000    chr1-0006       -0.016

data_2:
> dim(data_2)
[1] 470870 5

> head(data_2)
   Chromosome     Start   End            Feature     GroupA_3
1:               chr1 15864 15865     cg13869341            0.207
2:               chr1 18826 18827     cg14008030           -0.288
3:               chr1 29406 29407     cg12045430           -0.331
4:               chr1 29424 29425     cg20826792           -0.074
5:               chr1 29434 29435     cg00381604            0.141
6:               chr1 68848 68849     cg20253340           -0.458


What I want to do : 
Based on column name "Chromosome", "Start" and "End" of two data set ,   I want 
to find which row (preciously "Feature") of data_2 is in every range ( between 
"Start" and "End") of data_1 ? Also "Chromosome" column element should be match 
between two data set. 

I have tried "GenomicRanges" packages describe in the post  
https://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as the 
data is very big ? 
Thanks in advance.


Regards.............
Tanvir Ahamed Stockholm, Sweden     |  mashra...@yahoo.com

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to