842752|ASPIRIN DISP AAH|TABS|75MG|2.9 is taken directly from my y_data text file.
If I search for ASPIRIN DISP AAH %in% y$name I get: FALSE. Despite the fact that it is there in the text file that I loaded into y. There must be a problem with my input. tsf_data <- read.table("feb09_267_presc_items_tsf.txt", header = TRUE, sep = "|", quote = "\"'", dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, fileEncoding = "", encoding = "unknown") Would fill=TRUE and strip.white=TRUE affect how the item_code columns are matched? When I look at the file in a script editor I cannot see any strange symbols or formatting? I have had to use fill otherwise I get the following error message: Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns Not sure if correcting this in read.table is affecting the join function of sqldb? Thanks, Natalie I also tried 842752 On Fri, Mar 12, 2010 at 6:05 PM, Gabor Grothendieck <ggrothendi...@gmail.com > wrote: > That is not so. 842752 does not exist in y$item_code and ASPIRIN has > a code of 22730. > > > 842752 %in% y$item_code > [1] FALSE > > > subset(y, name == "ASPIRIN") > item_code name formulation_code strength bnf_code > 850 22730 ASPIRIN TABS 300MG 4.7.1 > 855 22780 ASPIRIN PDR NULL 4.7.1 > 856 22790 ASPIRIN MIXT $ 4.7.1 > > > On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam > <nvanzuy...@gmail.com> wrote: > > ---------- Forwarded message ---------- > > From: Natalie Van Zuydam <nvanzuy...@gmail.com> > > Date: Fri, Mar 12, 2010 at 5:49 PM > > Subject: Re: [R] sqldf not joining all the fields > > To: David Winsemius <dwinsem...@comcast.net> > > > > > > Dear David > > > > I'm not sure what the problem is as for every item code there is a > > corresponding information in the y_data. For example 842752 from the > x_data > > corresponds to Aspirin in the y_data? Yet when I use sqldf to join the > two > > df's I get NA values in the columns from the y_data in z for 842752 item > > code....is there something wrong with my sqldf code or something wrong > with > > the way I have inputed the data frames? > > > > Thanks for taking the time to help me, > > Natalie > > > > > > > > On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius <dwinsem...@comcast.net > >wrote: > > > >> If I assign the file input to y_data and change you sqldf to > >> > >> > z <- sqldf("select * from x_data left join y_data using > (item_code)"); z > >> > >> I can replicate your result. Even after changing the types of the two > >> item_code fields to match I still get the same result and when I see to > what > >> degree they share values I get: > >> > >> > sum(x_data$item_code %in% y_data$item_code) > >> [1] 2 > >> > sum(y_data$item_code %in% x_data$item_code) > >> [1] 2 > >> > >> > >> So why are you so sure they are "complete" as you claimed in your first > >> email. > >> > >> -- > >> David. > >> > >> > >> > >> On Mar 12, 2010, at 12:29 PM, David Winsemius wrote: > >> > >> You have now given two different assignments to x_data and none to > y_data: > >>> > >>> The str( from the file access offering: > >>> > >>> > str(x_data) > >>> 'data.frame': 2848 obs. of 5 variables: > >>> $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... > >>> $ name : chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT > >>> CELLULOSE MESO" ... > >>> $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ... > >>> $ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ... > >>> $ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ... > >>> > >>> The str from assignment from the dput offering > >>> > str(x_data) > >>> 'data.frame': 10 obs. of 10 variables: > >>> $ prochi : chr "CAO0000713" "CAO0000713" "CAO0000713" > >>> "CAO0000713" ... > >>> $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000" > >>> "03/07/2000" ... > >>> $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ... > >>> $ item_code : chr "842752" "7800" "842652" "842652" ... > >>> $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ... > >>> $ quantity : chr "60" "100G" "60" "60" ... > >>> $ directions : chr "1/D" "A/TD" "1/D" "1/D" ... > >>> $ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ... > >>> $ datasource : chr "TSF" "TSF" "TSF" "TSF" ... > >>> $ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ... > >>> > >>> This code "worked", but it is not clear that the x-y assignments were > >>> correct: > >>> > >>> x_data <- read.table(file=" > >>> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt", > header > >>> = TRUE, sep = "|", quote = "\"'", > >>> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, > >>> nrows = 3864284, > >>> skip = 0, check.names = TRUE,fill=TRUE, > >>> strip.white = TRUE, blank.lines.skip = TRUE, > >>> comment.char = "#", allowEscapes = FALSE, flush = FALSE, > >>> fileEncoding = "", encoding = "unknown") > >>> > >>> -- > >>> David. > >>> > >>> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: > >>> > >>> > >>>> The y_data file has over 9000 rows in it so I thought it would be more > >>>> practical to give you the file to download.... > >>>> -- > >>>> View this message in context: > >>>> > http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html > >>>> Sent from the R help mailing list archive at Nabble.com. > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>> > >>> David Winsemius, MD > >>> West Hartford, CT > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> David Winsemius, MD > >> West Hartford, CT > >> > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.