[R] Does SQL group by have a heavy duty equivalent in R

Farrel Buchinsky Sat, 30 Dec 2006 23:18:40 -0800

I have hundreds of humans who have undergone SNP genotyping at hundreds of
loci. Some have even undergone the procedure twice or thrice (kind of an
internal control).


So obviously I need to find those replications, and confirm that the results
are the same. If there is discordance then I need to address it.

I tried to use the aggregate function

nr.attempts
<-aggregate(RawSeq$GENOTYPE_ID,list(sample=RawSeq$SAMPLE_ID,assay=RawSeq$ASSAY_ID),length)
This was simply to figure out how many times the same piece of information
had been obtained. I ran out of patience. It took beyond forever and tapply
did not perform much better. The reshape package did not help - it implied
one was out of luck if the data was not numeric. All of my data is character
or factor.

Instead I used RODBC

sqlSave(channel,RawSeq)
to push the table into a Microsoft Access database
Then a sql query, courtesy of the Microsoft Access Query Wizard a la design
mode.

SELECT RawSeq.SAMPLE_ID, RawSeq.ASSAY_ID, Min(RawSeq.GENOTYPE_ID) AS
MinOfGENOTYPE_ID, Max(RawSeq.GENOTYPE_ID) AS MaxOfGENOTYPE_ID, Count(
RawSeq.rownames) AS CountOfrownames
FROM RawSeq
WHERE (((RawSeq.GENOTYPE_ID)<>""))
GROUP BY RawSeq.SAMPLE_ID, RawSeq.ASSAY_ID
ORDER BY Count(RawSeq.rownames) DESC;

This way I could easily use the minimum and maximum values to see if they
were discordant.
Microsoft Access handled it with aplomb. I plan to use RODBC to bring the
result of the SQL query back into R.

This is the first time I have seen Microsoft Access outpace R.
Is my observation correct or am I missing something. I would much rather
perform all data manipulation and analyses in R.



-- 
Farrel Buchinsky

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Does SQL group by have a heavy duty equivalent in R

Reply via email to