That is not so. 842752 does not exist in y$item_code and ASPIRIN has a code of 22730.
> 842752 %in% y$item_code [1] FALSE > subset(y, name == "ASPIRIN") item_code name formulation_code strength bnf_code 850 22730 ASPIRIN TABS 300MG 4.7.1 855 22780 ASPIRIN PDR NULL 4.7.1 856 22790 ASPIRIN MIXT $ 4.7.1 On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam <nvanzuy...@gmail.com> wrote: > ---------- Forwarded message ---------- > From: Natalie Van Zuydam <nvanzuy...@gmail.com> > Date: Fri, Mar 12, 2010 at 5:49 PM > Subject: Re: [R] sqldf not joining all the fields > To: David Winsemius <dwinsem...@comcast.net> > > > Dear David > > I'm not sure what the problem is as for every item code there is a > corresponding information in the y_data. For example 842752 from the x_data > corresponds to Aspirin in the y_data? Yet when I use sqldf to join the two > df's I get NA values in the columns from the y_data in z for 842752 item > code....is there something wrong with my sqldf code or something wrong with > the way I have inputed the data frames? > > Thanks for taking the time to help me, > Natalie > > > > On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius > <dwinsem...@comcast.net>wrote: > >> If I assign the file input to y_data and change you sqldf to >> >> > z <- sqldf("select * from x_data left join y_data using (item_code)"); z >> >> I can replicate your result. Even after changing the types of the two >> item_code fields to match I still get the same result and when I see to what >> degree they share values I get: >> >> > sum(x_data$item_code %in% y_data$item_code) >> [1] 2 >> > sum(y_data$item_code %in% x_data$item_code) >> [1] 2 >> >> >> So why are you so sure they are "complete" as you claimed in your first >> email. >> >> -- >> David. >> >> >> >> On Mar 12, 2010, at 12:29 PM, David Winsemius wrote: >> >> You have now given two different assignments to x_data and none to y_data: >>> >>> The str( from the file access offering: >>> >>> > str(x_data) >>> 'data.frame': 2848 obs. of 5 variables: >>> $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... >>> $ name : chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT >>> CELLULOSE MESO" ... >>> $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ... >>> $ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ... >>> $ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ... >>> >>> The str from assignment from the dput offering >>> > str(x_data) >>> 'data.frame': 10 obs. of 10 variables: >>> $ prochi : chr "CAO0000713" "CAO0000713" "CAO0000713" >>> "CAO0000713" ... >>> $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000" >>> "03/07/2000" ... >>> $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ... >>> $ item_code : chr "842752" "7800" "842652" "842652" ... >>> $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ... >>> $ quantity : chr "60" "100G" "60" "60" ... >>> $ directions : chr "1/D" "A/TD" "1/D" "1/D" ... >>> $ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ... >>> $ datasource : chr "TSF" "TSF" "TSF" "TSF" ... >>> $ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ... >>> >>> This code "worked", but it is not clear that the x-y assignments were >>> correct: >>> >>> x_data <- read.table(file=" >>> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt", header >>> = TRUE, sep = "|", quote = "\"'", >>> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, >>> nrows = 3864284, >>> skip = 0, check.names = TRUE,fill=TRUE, >>> strip.white = TRUE, blank.lines.skip = TRUE, >>> comment.char = "#", allowEscapes = FALSE, flush = FALSE, >>> fileEncoding = "", encoding = "unknown") >>> >>> -- >>> David. >>> >>> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: >>> >>> >>>> The y_data file has over 9000 rows in it so I thought it would be more >>>> practical to give you the file to download.... >>>> -- >>>> View this message in context: >>>> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html >>>> Sent from the R help mailing list archive at Nabble.com. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> David Winsemius, MD >> West Hartford, CT >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.