And I should also add that if I merge only on one column it works fine but
the result is not what I want.
merge(data_lane6_snps, data_lane6_snps_rsid , by = c("SNP") : works as
expected.
Is the "chr" column being a factor creating probs here ?
-A
On Tue, Apr 6, 2010 at 4:03 PM, Abhishek Pratap <[email protected]>wrote:
> Hi David
>
> Here it is. You can ignore the bio jargon if it sounds confusing. The
> corresponding data type of column (SNP, chr) on which I am applying merge is
> same.
>
> merge(data_lane6_snps, data_lane6_snps_rsid , by = c("SNP,"chr"))
>
>
> str(data_lane6_snps)
> 'data.frame': 7724462 obs. of 10 variables:
> $ chr : Factor w/ 25 levels "chr1","chr10",..: 1 1 1 1 1 1 1 1 1
> 1 ...
> $ SNP : int 100 101 103 108 179 180 191 197 218 222 ...
> $ reference : Factor w/ 5 levels "A","C","G","N",..: 2 2 5 2 2 5 2 2 1
> 5 ...
> $ genotype : Factor w/ 10 levels "A","C","G","K",..: 1 1 1 8 2 2 3 8
> 2 2 ...
> $ consensus_qual: int 0 0 0 4 33 33 19 19 19 19 ...
> $ snp_qual : int 0 0 0 4 0 33 19 19 19 19 ...
> $ rms_qual : int 0 0 0 0 21 21 21 21 21 21 ...
> $ depth : int 1 1 1 1 2 2 2 2 2 2 ...
> $ bases : Factor w/ 453774 levels "^!,","^!,^!,",..: 5 5 5 410998
> 49793 155731 284998 416878 133393 133393 ...
> $ base_quality : Factor w/ 555104 levels "`","``","```",..: 359 359 359
> 54813 92856 92856 92856 92856 92539 55424 ...
>
> > str(data_lane6_snps_rsid)
> 'data.frame': 797807 obs. of 4 variables:
> $ chr : Factor w/ 24 levels "1","10","11",..: 3 3 3 3 3 3 3 3 3 3 ...
> $ SNP : int 68143872 11071026 69423434 12394791 1302846 95330693 3921381
> 57122299 41899656 76990037 ...
> $ end : int 68143872 11071026 69423434 12394791 1302846 95330693 3921381
> 57122299 41899656 76990037 ...
> $ rsid: Factor w/ 797807 levels "rs10","rs10000010",..: 100229 685690
> 505395 470219 780326 29342 29263 327909 434159 723152 ...
>
>
> On Tue, Apr 6, 2010 at 3:59 PM, David Winsemius <[email protected]>wrote:
>
>>
>> On Apr 6, 2010, at 3:54 PM, Abhishek Pratap wrote:
>>
>> Hi Guys
>>>
>>> I have two data frames which I would like to merge on two conditions.
>>>
>>> I am doing the following (abstract form)
>>>
>>> new.data.frame <- merge(df1,df2, by=c("Col1","Col2"))
>>>
>>
>> What does
>>
>> str(df1) ; str(df2)
>>
>> ... show?
>>
>>
>>
>>> It is giving me a null result.
>>>
>>> Basically I need to apply two conditions.
>>>
>>> I also tried sqldf but it is running forever. Will indexing help ?
>>>
>>> temp <- sqldf("select a.chr,a.SNP,a.snp_qual,a.rms_qual,a.depth,b.rsid
>>> FROM
>>> + data_lane6_snps a,
>>> + data_lane6_snps_rsid b
>>> + WHERE
>>> + a.SNP = b.SNP
>>> + AND
>>> + a.chr = b.chr
>>> + ")
>>>
>>> Thanks!
>>> -Abhi
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [email protected] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.