Hello,

I'm having trouble discovering what's going wrong with my use of natural
joins via sqldf.

Following the instructions under 4i at http://code.google.com/p/sqldf/,
which discusses creating indices to speed joins, I have been only unreliably
able to get natural joins to work.

For example,

> Tid <- c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
10:58:00', 'AES 01-01-11 12:30:00')
> A <- data.frame(Tid, dfName = 'a')
> B <- data.frame(Tid = Tid[2:4], dfName = 'b')
> C <- data.frame(Tid = Tid[1:3], dfName = 'c')

# then use the sqldf library
> library(sqldf)
> sqldf()

# to create indices on the Tid variable shared across data.frames
> sqldf('create index indA on A(Tid)')
> sqldf('create index indB on B(Tid)')
> sqldf('create index indC on C(Tid)')

# check to make sure everything is there
> sqldf('select * from sqlite_master')

# doing a natural join (implicitly on Tid)
# does not give the expected joins
> sqldf('select * from main.A natural join main.B')
[1] Tid    dfName
<0 rows> (or 0-length row.names)
> sqldf('select * from main.A natural join main.C')
[1] Tid    dfName
<0 rows> (or 0-length row.names)
> sqldf('select * from main.B natural join main.C')
[1] Tid    dfName
<0 rows> (or 0-length row.names)

# even using a where clause (which doesn't have the efficiency qualities I
need the indexed natural joins for) is problematic, setting values of the
dfName variable incorrectly for the data from C
> sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                    Tid dfName                   Tid dfName
1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b

I'm grateful for your guidance on what I'm doing wrong with the natural join
in sqldf.

many thanks,
Nick

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to