Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf.
Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, > Tid <- c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') > A <- data.frame(Tid, dfName = 'a') > B <- data.frame(Tid = Tid[2:4], dfName = 'b') > C <- data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library > library(sqldf) > sqldf() # to create indices on the Tid variable shared across data.frames > sqldf('create index indA on A(Tid)') > sqldf('create index indB on B(Tid)') > sqldf('create index indC on C(Tid)') # check to make sure everything is there > sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins > sqldf('select * from main.A natural join main.B') [1] Tid dfName <0 rows> (or 0-length row.names) > sqldf('select * from main.A natural join main.C') [1] Tid dfName <0 rows> (or 0-length row.names) > sqldf('select * from main.B natural join main.C') [1] Tid dfName <0 rows> (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C > sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.