---------- Forwarded message ---------- From: Natalie Van Zuydam <nvanzuy...@gmail.com> Date: Fri, Mar 12, 2010 at 5:49 PM Subject: Re: [R] sqldf not joining all the fields To: David Winsemius <dwinsem...@comcast.net>
Dear David I'm not sure what the problem is as for every item code there is a corresponding information in the y_data. For example 842752 from the x_data corresponds to Aspirin in the y_data? Yet when I use sqldf to join the two df's I get NA values in the columns from the y_data in z for 842752 item code....is there something wrong with my sqldf code or something wrong with the way I have inputed the data frames? Thanks for taking the time to help me, Natalie On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius <dwinsem...@comcast.net>wrote: > If I assign the file input to y_data and change you sqldf to > > > z <- sqldf("select * from x_data left join y_data using (item_code)"); z > > I can replicate your result. Even after changing the types of the two > item_code fields to match I still get the same result and when I see to what > degree they share values I get: > > > sum(x_data$item_code %in% y_data$item_code) > [1] 2 > > sum(y_data$item_code %in% x_data$item_code) > [1] 2 > > > So why are you so sure they are "complete" as you claimed in your first > email. > > -- > David. > > > > On Mar 12, 2010, at 12:29 PM, David Winsemius wrote: > > You have now given two different assignments to x_data and none to y_data: >> >> The str( from the file access offering: >> >> > str(x_data) >> 'data.frame': 2848 obs. of 5 variables: >> $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... >> $ name : chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT >> CELLULOSE MESO" ... >> $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ... >> $ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ... >> $ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ... >> >> The str from assignment from the dput offering >> > str(x_data) >> 'data.frame': 10 obs. of 10 variables: >> $ prochi : chr "CAO0000713" "CAO0000713" "CAO0000713" >> "CAO0000713" ... >> $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000" >> "03/07/2000" ... >> $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ... >> $ item_code : chr "842752" "7800" "842652" "842652" ... >> $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ... >> $ quantity : chr "60" "100G" "60" "60" ... >> $ directions : chr "1/D" "A/TD" "1/D" "1/D" ... >> $ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ... >> $ datasource : chr "TSF" "TSF" "TSF" "TSF" ... >> $ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ... >> >> This code "worked", but it is not clear that the x-y assignments were >> correct: >> >> x_data <- read.table(file=" >> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt", header >> = TRUE, sep = "|", quote = "\"'", >> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, >> nrows = 3864284, >> skip = 0, check.names = TRUE,fill=TRUE, >> strip.white = TRUE, blank.lines.skip = TRUE, >> comment.char = "#", allowEscapes = FALSE, flush = FALSE, >> fileEncoding = "", encoding = "unknown") >> >> -- >> David. >> >> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: >> >> >>> The y_data file has over 9000 rows in it so I thought it would be more >>> practical to give you the file to download.... >>> -- >>> View this message in context: >>> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.