[R] THE EQUIVALENT OF SQL INNER TABLE JOIN IN R

2009-04-26 Thread Nigel Birney

Hello all,

Apologize for the newbie question. What's the easiest way to do a SQL inner
table join in R? 

Say I have a table containing column names A, B, C and another which has
columns named C, D, E. I would like to do an inner table join on C and
produce a table A, B, C, D, E.

thanks a lot,

N.
-- 
View this message in context: 
http://www.nabble.com/THE-EQUIVALENT-OF-SQL-INNER-TABLE-JOIN-IN-R-tp23238179p23238179.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] THE EQUIVALENT OF SQL INNER TABLE JOIN IN R

2009-04-26 Thread Gabor Grothendieck
1. ?merge

2. sqldf package whose home page is at:
http://sqldf.googlecode.com

On Sat, Apr 25, 2009 at 9:15 PM, Nigel Birney  wrote:
>
> Hello all,
>
> Apologize for the newbie question. What's the easiest way to do a SQL inner
> table join in R?
>
> Say I have a table containing column names A, B, C and another which has
> columns named C, D, E. I would like to do an inner table join on C and
> produce a table A, B, C, D, E.
>
> thanks a lot,
>
> N.
> --
> View this message in context: 
> http://www.nabble.com/THE-EQUIVALENT-OF-SQL-INNER-TABLE-JOIN-IN-R-tp23238179p23238179.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] THE EQUIVALENT OF SQL INNER TABLE JOIN IN R

2009-04-26 Thread Peter Dalgaard

Nigel Birney wrote:

Hello all,

Apologize for the newbie question. What's the easiest way to do a SQL inner
table join in R? 


Say I have a table containing column names A, B, C and another which has
columns named C, D, E. I would like to do an inner table join on C and
produce a table A, B, C, D, E.


merge(), perhaps? Otherwise describe what an inner table join does.

-pd

--
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] THE EQUIVALENT OF SQL INNER TABLE JOIN IN R

2009-04-26 Thread Wacek Kusnierczyk
Peter Dalgaard wrote:
> Nigel Birney wrote:
>> Hello all,
>>
>> Apologize for the newbie question. What's the easiest way to do a SQL
>> inner
>> table join in R?
>> Say I have a table containing column names A, B, C and another which has
>> columns named C, D, E. I would like to do an inner table join on C and
>> produce a table A, B, C, D, E.
>
> merge(), perhaps? Otherwise describe what an inner table join does.

btw., i think ?merge has it wrong when it comes to the sql join terminology:

" In SQL database terminology, the default value of 'all = FALSE'
 gives a _natural join_, a special case of an _inner join_."

following [1, sec. 6.5] (and in concordance with the typical use of the
terms in the db lingo, as of my rather limited knowledge), a natural
join is a join where values are compared pairwise for columns with the
same names across the joined tables.  the result from merge with
all=FALSE does not have to be a natural join, while it will be an inner
join, as in:

d1 = data.frame(a=1:5, b=rnorm(5))
d2 = data.frame(c=3:7, d=rnorm(5))

merge(d1, d2, all=FALSE)
# 25 rows, a cross join (an outer join)
 # *not* an inner join, even less so a natural join

merge(d1, d2, by.x='a', by.y='c', all=FALSE)
# 3 rows, an inner join
# *not* a natural join

the point is, all=FALSE gives a natural join iff by is equivalent to
intersect(names(x), names(y)), and these two conditions together are
necessary (and sufficient) for a join to be a natural join.

the snippet from ?merge quoted above is wrong and misleading, and should
be corrected to sth like:

" In SQL database terminology, the default value of 'all = FALSE'
 gives an _inner join_.  If, in addition, 'by' is equivalent to
'intersect(names(x), names(y))', the the join is a _natural join_, a
special case of an _inner join_."

or, if the authors insist ?merge is correct, would they provide a reference?

(in fact, the terminology is not that coherent;  e.g., in mysql natural
merely refers to column names, and not to how to choose rows, and one
can have natural outer joins -- which are not, in general, inner joins.)


vQ

[1] c.j. date's, sql and relational theory, o'reilly 2009

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.