Re: [R] sqldf not joining all the fields
If I assign the file input to y_data and change you sqldf to > z <- sqldf("select * from x_data left join y_data using (item_code)"); z I can replicate your result. Even after changing the types of the two item_code fields to match I still get the same result and when I see to what degree they share values I get: > sum(x_data$item_code %in% y_data$item_code) [1] 2 > sum(y_data$item_code %in% x_data$item_code) [1] 2 So why are you so sure they are "complete" as you claimed in your first email. -- David. On Mar 12, 2010, at 12:29 PM, David Winsemius wrote: You have now given two different assignments to x_data and none to y_data: The str( from the file access offering: > str(x_data) 'data.frame': 2848 obs. of 5 variables: $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... $ name: chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT CELLULOSE MESO" ... $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ... $ strength: chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ... $ bnf_code: chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ... The str from assignment from the dput offering > str(x_data) 'data.frame': 10 obs. of 10 variables: $ prochi : chr "CAO713" "CAO713" "CAO713" "CAO713" ... $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000" "03/07/2000" ... $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ... $ item_code : chr "842752" "7800" "842652" "842652" ... $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ... $ quantity : chr "60" "100G" "60" "60" ... $ directions : chr "1/D" "A/TD" "1/D" "1/D" ... $ no_of_packs: chr "NULL" "NULL" "NULL" "NULL" ... $ datasource : chr "TSF" "TSF" "TSF" "TSF" ... $ scan_ref_no: chr "NULL" "NULL" "NULL" "NULL" ... This code "worked", but it is not clear that the x-y assignments were correct: x_data <- read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt ", header = TRUE, sep = "|", quote = "\"'", dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, fileEncoding = "", encoding = "unknown") -- David. On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: The y_data file has over 9000 rows in it so I thought it would be more practical to give you the file to download -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Sorry! It is the end of the day for me. So dput(x) structure(list(prochi = c("CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713"), prescribed_date = c("22/06/2001", "28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001", "16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), item_code = c("842752", "7800", "842652", "842652", "842752", "842752", "842752", "39620", "842752", "842752"), res_seqno = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60", "5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D", "1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL" )), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code", "res_seqno", "quantity", "directions", "no_of_packs", "datasource", "scan_ref_no"), row.names = c(NA, 10L), class = "data.frame") y_data <- read.table(file=" http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt";, header = TRUE, sep = "|", quote = "\"'", dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, fileEncoding = "", encoding = "unknown") So the y_data essentially contains the lookup table for the item codes in x. require(sqldf) z <- sqldf("select * from x left join y using (item_code)") z [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Yes, its not clear what data was used. Please provide a single email that contains code that can be copied from R and pasted into another session that will reproduce the problem. x_data <- ...dput output goes here... y_data <- dput output goes here ... library(sqldf) ... rest of code ... or else: x_data <- read.table("http://...";, ...) y_data <- read.table("http://...";. ...) library(sqldf) ... rest of code ... On Fri, Mar 12, 2010 at 12:29 PM, David Winsemius wrote: > You have now given two different assignments to x_data and none to y_data: > > The str( from the file access offering: > >> str(x_data) > 'data.frame': 2848 obs. of 5 variables: > $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... > $ name : chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT > CELLULOSE MESO" ... > $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ... > $ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ... > $ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ... > > The str from assignment from the dput offering >> str(x_data) > 'data.frame': 10 obs. of 10 variables: > $ prochi : chr "CAO713" "CAO713" "CAO713" "CAO713" > ... > $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000" "03/07/2000" > ... > $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ... > $ item_code : chr "842752" "7800" "842652" "842652" ... > $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ... > $ quantity : chr "60" "100G" "60" "60" ... > $ directions : chr "1/D" "A/TD" "1/D" "1/D" ... > $ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ... > $ datasource : chr "TSF" "TSF" "TSF" "TSF" ... > $ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ... > > This code "worked", but it is not clear that the x-y assignments were > correct: > > x_data <- > read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt";, > header = TRUE, sep = "|", quote = "\"'", > dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, > nrows = 3864284, > skip = 0, check.names = TRUE,fill=TRUE, > strip.white = TRUE, blank.lines.skip = TRUE, > comment.char = "#", allowEscapes = FALSE, flush = FALSE, > fileEncoding = "", encoding = "unknown") > > -- > David. > > On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: > >> >> The y_data file has over 9000 rows in it so I thought it would be more >> practical to give you the file to download >> -- >> View this message in context: >> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Sorry! It is the end of the day for me. So dput(x) structure(list(prochi = c("CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713"), prescribed_date = c("22/06/2001", "28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001", "16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), item_code = c("842752", "7800", "842652", "842652", "842752", "842752", "842752", "39620", "842752", "842752"), res_seqno = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60", "5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D", "1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL" )), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code", "res_seqno", "quantity", "directions", "no_of_packs", "datasource", "scan_ref_no"), row.names = c(NA, 10L), class = "data.frame") y_data <- read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt";, header = TRUE, sep = "|", quote = "\"'", dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, fileEncoding = "", encoding = "unknown") So the y_data essentially contains the lookup table for the item codes in x. Thanks and sorry for the mix up -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590849.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
You have now given two different assignments to x_data and none to y_data: The str( from the file access offering: > str(x_data) 'data.frame': 2848 obs. of 5 variables: $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... $ name: chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT CELLULOSE MESO" ... $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ... $ strength: chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ... $ bnf_code: chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ... The str from assignment from the dput offering > str(x_data) 'data.frame': 10 obs. of 10 variables: $ prochi : chr "CAO713" "CAO713" "CAO713" "CAO713" ... $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000" "03/07/2000" ... $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ... $ item_code : chr "842752" "7800" "842652" "842652" ... $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ... $ quantity : chr "60" "100G" "60" "60" ... $ directions : chr "1/D" "A/TD" "1/D" "1/D" ... $ no_of_packs: chr "NULL" "NULL" "NULL" "NULL" ... $ datasource : chr "TSF" "TSF" "TSF" "TSF" ... $ scan_ref_no: chr "NULL" "NULL" "NULL" "NULL" ... This code "worked", but it is not clear that the x-y assignments were correct: x_data <- read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt ", header = TRUE, sep = "|", quote = "\"'", dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, fileEncoding = "", encoding = "unknown") -- David. On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: The y_data file has over 9000 rows in it so I thought it would be more practical to give you the file to download -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
The y_data file has over 9000 rows in it so I thought it would be more practical to give you the file to download -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
What about y_data? On Fri, Mar 12, 2010 at 12:14 PM, Newbie19_02 wrote: > > dput(x_data) > > structure(list(prochi = c("CAO713", "CAO713", "CAO713", > "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", > "CAO713", "CAO713"), prescribed_date = c("22/06/2001", > "28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001", > "16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth = > c("NULL", > "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", > "NULL"), item_code = c("842752", "7800", "842652", "842652", > "842752", "842752", "842752", "39620", "842752", "842752"), res_seqno = > c("NULL", > "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", > "NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60", > "5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D", > "1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL", > "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", > "NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF", > "TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL", > "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL" > )), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code", > "res_seqno", "quantity", "directions", "no_of_packs", "datasource", > "scan_ref_no"), row.names = c(NA, 10L), class = "data.frame") > > -- > View this message in context: > http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Please provide code that I can just copy from your post and paste into my session. Either provide dput output as requested or provide the files on the internet together with code that reads them off the internet. On Fri, Mar 12, 2010 at 12:06 PM, Newbie19_02 wrote: > > http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt > feb09_267_presc_items_tsf.txt > > is the total file for y so if I use the command line with the total data for > y then I get the output specified in z > > Thanks, > Natalie > -- > View this message in context: > http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
y_data <- read.table("feb_267_presc_items_tsf.txt", header = TRUE, sep = "|", quote = "\"'", dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, fileEncoding = "", encoding = "unknown") Will read the file in same that I have and I have posted the dput. Sorry for not giving you what you originally wanted... Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590826.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
dput(x_data) structure(list(prochi = c("CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713", "CAO713"), prescribed_date = c("22/06/2001", "28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001", "16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), item_code = c("842752", "7800", "842652", "842652", "842752", "842752", "842752", "39620", "842752", "842752"), res_seqno = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60", "5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D", "1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL" )), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code", "res_seqno", "quantity", "directions", "no_of_packs", "datasource", "scan_ref_no"), row.names = c(NA, 10L), class = "data.frame") -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt feb09_267_presc_items_tsf.txt is the total file for y so if I use the command line with the total data for y then I get the output specified in z Thanks, Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Can you show the output of dput(x_data) and dput(y_data). On Fri, Mar 12, 2010 at 11:56 AM, Newbie19_02 wrote: > > Dear R users, > > I have two data frames that were read from text files as follows: > > x_data <- read.table("x.txt", header = TRUE, sep = "|", quote = "\"'", > dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, > nrows = 3864284, > skip = 0, check.names = TRUE,fill=TRUE, > strip.white = TRUE, blank.lines.skip = TRUE, > comment.char = "#", allowEscapes = FALSE, flush = FALSE, > fileEncoding = "", encoding = "unknown") > > x_data > > prochi prescribed_date dataMonth item_code res_seqno quantity directions > CAO713 22/06/2001 NULL 842752 NULL 60 1/D > CAO713 28/04/2000 NULL 7800 NULL 100G A/TD > CAO713 10/04/2000 NULL 842652 NULL 60 1/D > CAO713 03/07/2000 NULL 842652 NULL 60 1/D > CAO713 09/01/2001 NULL 842752 NULL 60 1/D > CAO713 16/10/2001 NULL 842752 NULL 60 1/D > CAO713 16/08/2001 NULL 842752 NULL 60 1/D > CAO713 17/09/1993 NULL 39620 NULL 5ML NIL > CAO713 01/05/2001 NULL 842752 NULL 60 1/D > CAO713 05/03/2001 NULL 842752 NULL 60 1/D > > > > y_data > > item_code name formulation_code strength > bnf_code > 100 NEONACLEX K TABS NULL 2.2.8 > 110 NEONACLEX TABS 5MG 2.2.1 > 50 MESORB DRESS 10CMX10CM 20.3.1 > 160 ABSORBENT CELLULOSE MESO DRESS 10CMX10CM 20.3.1 > 161 ABSORBENT CELLULOSE MESO DRESS 10CMX15CM 20.3.1 > 164 ABSORBENT CELLULOSE MESO DRESS 20CMX25CM 20.3.1 > 200 SEPTRIN TABS 480MG 5.1.8 > 210 SEPTRIN PAED SF SUSP 240MG/5ML 5.1.8 > 212 SEPTRIN ADULT SUSP 480MG/5ML 5.1.8 > 220 SEPTRIN FORTE TABS 960MG 5.1.8 > etc > > > contains all the information for the item codes > y was read in in the same way. > > I then used the following code: > > z <- sqldf("select * from x left join y using (code)") > > when I use this on my real data I get an output: > prochi prescribed_date dataMonth item_code res_seqno quantity directions > 1 CAO713 22/06/2001 NULL 842752 NULL 60 > 1/D > 2 CAO713 28/04/2000 NULL 7800 NULL 100G > A/TD > 3 CAO713 10/04/2000 NULL 842652 NULL 60 > 1/D > 4 CAO713 03/07/2000 NULL 842652 NULL 60 > 1/D > 5 CAO713 09/01/2001 NULL 842752 NULL 60 > 1/D > 6 CAO713 16/10/2001 NULL 842752 NULL 60 > 1/D > 7 CAO713 16/08/2001 NULL 842752 NULL 60 > 1/D > 8 CAO713 17/09/1993 NULL 39620 NULL 5ML > NIL > 9 CAO713 01/05/2001 NULL 842752 NULL 60 > 1/D > 10 CAO713 05/03/2001 NULL 842752 NULL 60 > 1/D > no_of_packs datasource scan_ref_no name formulation_code strength > 1 NULL TSF NULL > 2 NULL TSF NULL BETNOVATE RD OINT 0.025% > 3 NULL TSF NULL > 4 NULL TSF NULL > 5 NULL TSF NULL > 6 NULL TSF NULL > 7 NULL TSF NULL > 8 NULL TSF NULL GAMMABULIN INJ 320MG > 9 NULL TSF NULL > 10 NULL TSF NULL > bnf_code > 1 > 2 13.4.1.2 > 3 > 4 > 5 > 6 > 7 > 8 14.5 > 9 > 10 > > > There is absolutely no reason for there to be anywhere as the > information for both the tables is complete. > > Not sure what the problem is? > > Thanks, > Natalie > -- > View this message in context: > http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
[R] sqldf not joining all the fields
Dear R users, I have two data frames that were read from text files as follows: x_data <- read.table("x.txt", header = TRUE, sep = "|", quote = "\"'", dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, fileEncoding = "", encoding = "unknown") x_data prochi prescribed_date dataMonth item_code res_seqno quantity directions CAO713 22/06/2001 NULL842752 NULL 601/D CAO713 28/04/2000 NULL 7800 NULL 100G A/TD CAO713 10/04/2000 NULL842652 NULL 601/D CAO713 03/07/2000 NULL842652 NULL 601/D CAO713 09/01/2001 NULL842752 NULL 601/D CAO713 16/10/2001 NULL842752 NULL 601/D CAO713 16/08/2001 NULL842752 NULL 601/D CAO713 17/09/1993 NULL 39620 NULL 5MLNIL CAO713 01/05/2001 NULL842752 NULL 601/D CAO713 05/03/2001 NULL842752 NULL 601/D y_data item_codename formulation_code strength bnf_code 100 NEONACLEX KTABS NULL2.2.8 110NEONACLEX TABS 5MG2.2.1 50 MESORB DRESS 10CMX10CM 20.3.1 160 ABSORBENT CELLULOSE MESODRESS 10CMX10CM 20.3.1 161 ABSORBENT CELLULOSE MESODRESS 10CMX15CM 20.3.1 164 ABSORBENT CELLULOSE MESODRESS 20CMX25CM 20.3.1 200 SEPTRINTABS 480MG5.1.8 210 SEPTRIN PAED SFSUSP 240MG/5ML5.1.8 212SEPTRIN ADULT SUSP 480MG/5ML5.1.8 220SEPTRIN FORTE TABS 960MG5.1.8 etc contains all the information for the item codes y was read in in the same way. I then used the following code: z <- sqldf("select * from x left join y using (code)") when I use this on my real data I get an output: prochi prescribed_date dataMonth item_code res_seqno quantity directions 1 CAO713 22/06/2001 NULL842752 NULL 60 1/D 2 CAO713 28/04/2000 NULL 7800 NULL 100G A/TD 3 CAO713 10/04/2000 NULL842652 NULL 60 1/D 4 CAO713 03/07/2000 NULL842652 NULL 60 1/D 5 CAO713 09/01/2001 NULL842752 NULL 60 1/D 6 CAO713 16/10/2001 NULL842752 NULL 60 1/D 7 CAO713 16/08/2001 NULL842752 NULL 60 1/D 8 CAO713 17/09/1993 NULL 39620 NULL 5ML NIL 9 CAO713 01/05/2001 NULL842752 NULL 60 1/D 10 CAO713 05/03/2001 NULL842752 NULL 60 1/D no_of_packs datasource scan_ref_no name formulation_code strength 1 NULLTSFNULL 2 NULLTSFNULL BETNOVATE RD OINT 0.025% 3 NULLTSFNULL 4 NULLTSFNULL 5 NULLTSFNULL 6 NULLTSFNULL 7 NULLTSFNULL 8 NULLTSFNULL GAMMABULIN INJ320MG 9 NULLTSFNULL 10NULLTSFNULL bnf_code 1 2 13.4.1.2 3 4 5 6 7 8 14.5 9 10 There is absolutely no reason for there to be anywhere as the information for both the tables is complete. Not sure what the problem is? Thanks, Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.