Re: [R] sqldf not joining all the fields

2010-03-12 Thread David Winsemius

If I assign the file input to y_data and change you sqldf to

> z  <- sqldf("select * from x_data left join y_data using  
(item_code)"); z


I can replicate your result. Even after changing the types of the two  
item_code fields to match I still get the same result and when I see  
to what degree they share values I get:


> sum(x_data$item_code %in% y_data$item_code)
[1] 2
> sum(y_data$item_code %in% x_data$item_code)
[1] 2


So why are you so sure they are "complete" as you claimed in your  
first email.


--
David.


On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:

You have now given two different assignments to x_data and none to  
y_data:


The str( from the file access offering:

> str(x_data)
'data.frame':   2848 obs. of  5 variables:
$ item_code   : int  100 110 150 160 161 164 200 210 212 220 ...
$ name: chr  "NEONACLEX K" "NEONACLEX" "MESORB"  
"ABSORBENT CELLULOSE MESO" ...

$ formulation_code: chr  "TABS" "TABS" "DRESS" "DRESS" ...
$ strength: chr  "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
$ bnf_code: chr  "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...

The str from assignment from the dput offering
> str(x_data)
'data.frame':   10 obs. of  10 variables:
$ prochi : chr  "CAO713" "CAO713" "CAO713"  
"CAO713" ...
$ prescribed_date: chr  "22/06/2001" "28/04/2000" "10/04/2000"  
"03/07/2000" ...

$ dataMonth  : chr  "NULL" "NULL" "NULL" "NULL" ...
$ item_code  : chr  "842752" "7800" "842652" "842652" ...
$ res_seqno  : chr  "NULL" "NULL" "NULL" "NULL" ...
$ quantity   : chr  "60" "100G" "60" "60" ...
$ directions : chr  "1/D" "A/TD" "1/D" "1/D" ...
$ no_of_packs: chr  "NULL" "NULL" "NULL" "NULL" ...
$ datasource : chr  "TSF" "TSF" "TSF" "TSF" ...
$ scan_ref_no: chr  "NULL" "NULL" "NULL" "NULL" ...

This code "worked", but it is not clear that the x-y assignments  
were correct:


x_data <- read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
", header = TRUE, sep = "|", quote = "\"'",
  dec = ".",as.is = TRUE,na.strings = "NA",colClasses =  
NA,

nrows = 3864284,
  skip = 0, check.names = TRUE,fill=TRUE,
  strip.white = TRUE, blank.lines.skip = TRUE,
  comment.char = "#", allowEscapes = FALSE, flush = FALSE,
  fileEncoding = "", encoding = "unknown")

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:



The y_data file has over 9000 rows in it so I thought it would be  
more

practical to give you the file to download
--
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Natalie Van Zuydam
Sorry!  It is the end of the day for me.

So

dput(x)

structure(list(prochi = c("CAO713", "CAO713", "CAO713",
"CAO713", "CAO713", "CAO713", "CAO713", "CAO713",
"CAO713", "CAO713"), prescribed_date = c("22/06/2001",
"28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001",
"16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth =
c("NULL",
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
"NULL"), item_code = c("842752", "7800", "842652", "842652",
"842752", "842752", "842752", "39620", "842752", "842752"), res_seqno =
c("NULL",
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
"NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60",
"5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D",
"1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL",
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
"NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF",
"TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL",
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"
)), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code",
"res_seqno", "quantity", "directions", "no_of_packs", "datasource",
"scan_ref_no"), row.names = c(NA, 10L), class = "data.frame")


y_data <-
read.table(file="
http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt";,
header = TRUE, sep = "|", quote = "\"'",

 dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
nrows = 3864284,
 skip = 0, check.names = TRUE,fill=TRUE,
 strip.white = TRUE, blank.lines.skip = TRUE,
 comment.char = "#", allowEscapes = FALSE, flush = FALSE,
 fileEncoding = "", encoding = "unknown")

So the y_data essentially contains the lookup table for the item codes in x.

require(sqldf)
z <- sqldf("select * from x left join y using (item_code)")
z

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Yes, its not clear what data was used.  Please provide a single email
that contains code that can be copied from R and pasted into another
session that will reproduce the problem.

x_data <- ...dput output goes here...
y_data <- dput output goes here ...
library(sqldf)
... rest of code ...

or else:

x_data <- read.table("http://...";, ...)
y_data <- read.table("http://...";. ...)
library(sqldf)
... rest of code ...

On Fri, Mar 12, 2010 at 12:29 PM, David Winsemius
 wrote:
> You have now given two different assignments to x_data and none to y_data:
>
> The str( from the file access offering:
>
>> str(x_data)
> 'data.frame':   2848 obs. of  5 variables:
>  $ item_code       : int  100 110 150 160 161 164 200 210 212 220 ...
>  $ name            : chr  "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT
> CELLULOSE MESO" ...
>  $ formulation_code: chr  "TABS" "TABS" "DRESS" "DRESS" ...
>  $ strength        : chr  "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
>  $ bnf_code        : chr  "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...
>
> The str from assignment from the dput offering
>> str(x_data)
> 'data.frame':   10 obs. of  10 variables:
>  $ prochi         : chr  "CAO713" "CAO713" "CAO713" "CAO713"
> ...
>  $ prescribed_date: chr  "22/06/2001" "28/04/2000" "10/04/2000" "03/07/2000"
> ...
>  $ dataMonth      : chr  "NULL" "NULL" "NULL" "NULL" ...
>  $ item_code      : chr  "842752" "7800" "842652" "842652" ...
>  $ res_seqno      : chr  "NULL" "NULL" "NULL" "NULL" ...
>  $ quantity       : chr  "60" "100G" "60" "60" ...
>  $ directions     : chr  "1/D" "A/TD" "1/D" "1/D" ...
>  $ no_of_packs    : chr  "NULL" "NULL" "NULL" "NULL" ...
>  $ datasource     : chr  "TSF" "TSF" "TSF" "TSF" ...
>  $ scan_ref_no    : chr  "NULL" "NULL" "NULL" "NULL" ...
>
> This code "worked", but it is not clear that the x-y assignments were
> correct:
>
> x_data <-
> read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt";,
> header = TRUE, sep = "|", quote = "\"'",
>               dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
> nrows = 3864284,
>               skip = 0, check.names = TRUE,fill=TRUE,
>               strip.white = TRUE, blank.lines.skip = TRUE,
>               comment.char = "#", allowEscapes = FALSE, flush = FALSE,
>               fileEncoding = "", encoding = "unknown")
>
> --
> David.
>
> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:
>
>>
>> The y_data file has over 9000 rows in it so I thought it would be more
>> practical to give you the file to download
>> --
>> View this message in context:
>> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

Sorry!  It is the end of the day for me.

So 

dput(x)

structure(list(prochi = c("CAO713", "CAO713", "CAO713", 
"CAO713", "CAO713", "CAO713", "CAO713", "CAO713", 
"CAO713", "CAO713"), prescribed_date = c("22/06/2001", 
"28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001", 
"16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth =
c("NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", 
"NULL"), item_code = c("842752", "7800", "842652", "842652", 
"842752", "842752", "842752", "39620", "842752", "842752"), res_seqno =
c("NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", 
"NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60", 
"5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D", 
"1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", 
"NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF", 
"TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"
)), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code", 
"res_seqno", "quantity", "directions", "no_of_packs", "datasource", 
"scan_ref_no"), row.names = c(NA, 10L), class = "data.frame")


y_data <-
read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt";,
header = TRUE, sep = "|", quote = "\"'",

  dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
nrows = 3864284,
  skip = 0, check.names = TRUE,fill=TRUE,
  strip.white = TRUE, blank.lines.skip = TRUE,
  comment.char = "#", allowEscapes = FALSE, flush = FALSE,
  fileEncoding = "", encoding = "unknown")

So the y_data essentially contains the lookup table for the item codes in x.

Thanks and sorry for the mix up
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590849.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread David Winsemius
You have now given two different assignments to x_data and none to  
y_data:


The str( from the file access offering:

> str(x_data)
'data.frame':   2848 obs. of  5 variables:
 $ item_code   : int  100 110 150 160 161 164 200 210 212 220 ...
 $ name: chr  "NEONACLEX K" "NEONACLEX" "MESORB"  
"ABSORBENT CELLULOSE MESO" ...

 $ formulation_code: chr  "TABS" "TABS" "DRESS" "DRESS" ...
 $ strength: chr  "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
 $ bnf_code: chr  "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...

The str from assignment from the dput offering
> str(x_data)
'data.frame':   10 obs. of  10 variables:
 $ prochi : chr  "CAO713" "CAO713" "CAO713"  
"CAO713" ...
 $ prescribed_date: chr  "22/06/2001" "28/04/2000" "10/04/2000"  
"03/07/2000" ...

 $ dataMonth  : chr  "NULL" "NULL" "NULL" "NULL" ...
 $ item_code  : chr  "842752" "7800" "842652" "842652" ...
 $ res_seqno  : chr  "NULL" "NULL" "NULL" "NULL" ...
 $ quantity   : chr  "60" "100G" "60" "60" ...
 $ directions : chr  "1/D" "A/TD" "1/D" "1/D" ...
 $ no_of_packs: chr  "NULL" "NULL" "NULL" "NULL" ...
 $ datasource : chr  "TSF" "TSF" "TSF" "TSF" ...
 $ scan_ref_no: chr  "NULL" "NULL" "NULL" "NULL" ...

This code "worked", but it is not clear that the x-y assignments were  
correct:


x_data <- read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
", header = TRUE, sep = "|", quote = "\"'",
   dec = ".",as.is = TRUE,na.strings = "NA",colClasses =  
NA,

nrows = 3864284,
   skip = 0, check.names = TRUE,fill=TRUE,
   strip.white = TRUE, blank.lines.skip = TRUE,
   comment.char = "#", allowEscapes = FALSE, flush = FALSE,
   fileEncoding = "", encoding = "unknown")

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:



The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
--
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
What  about y_data?

On Fri, Mar 12, 2010 at 12:14 PM, Newbie19_02  wrote:
>
> dput(x_data)
>
> structure(list(prochi = c("CAO713", "CAO713", "CAO713",
> "CAO713", "CAO713", "CAO713", "CAO713", "CAO713",
> "CAO713", "CAO713"), prescribed_date = c("22/06/2001",
> "28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001",
> "16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth =
> c("NULL",
> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
> "NULL"), item_code = c("842752", "7800", "842652", "842652",
> "842752", "842752", "842752", "39620", "842752", "842752"), res_seqno =
> c("NULL",
> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
> "NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60",
> "5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D",
> "1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL",
> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
> "NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF",
> "TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL",
> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"
> )), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code",
> "res_seqno", "quantity", "directions", "no_of_packs", "datasource",
> "scan_ref_no"), row.names = c(NA, 10L), class = "data.frame")
>
> --
> View this message in context: 
> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Please provide code that I can just copy from your post and paste into
my session.   Either provide dput output as requested or provide the
files on the internet together with code that reads them off the
internet.

On Fri, Mar 12, 2010 at 12:06 PM, Newbie19_02  wrote:
>
> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
> feb09_267_presc_items_tsf.txt
>
> is the total file for y so if I use the command line with the total data for
> y then I get the output specified in z
>
> Thanks,
> Natalie
> --
> View this message in context: 
> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

y_data <- read.table("feb_267_presc_items_tsf.txt", header = TRUE, sep = "|",
quote = "\"'",
dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = "#", allowEscapes = FALSE, flush = FALSE,
fileEncoding = "", encoding = "unknown") 

Will read the file in same that I have and I have posted the dput.

Sorry for not giving you what you originally wanted...

Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

dput(x_data)

structure(list(prochi = c("CAO713", "CAO713", "CAO713", 
"CAO713", "CAO713", "CAO713", "CAO713", "CAO713", 
"CAO713", "CAO713"), prescribed_date = c("22/06/2001", 
"28/04/2000", "10/04/2000", "03/07/2000", "09/01/2001", "16/10/2001", 
"16/08/2001", "17/09/1993", "01/05/2001", "05/03/2001"), dataMonth =
c("NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", 
"NULL"), item_code = c("842752", "7800", "842652", "842652", 
"842752", "842752", "842752", "39620", "842752", "842752"), res_seqno =
c("NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", 
"NULL"), quantity = c("60", "100G", "60", "60", "60", "60", "60", 
"5ML", "60", "60"), directions = c("1/D", "A/TD", "1/D", "1/D", 
"1/D", "1/D", "1/D", "NIL", "1/D", "1/D"), no_of_packs = c("NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", 
"NULL"), datasource = c("TSF", "TSF", "TSF", "TSF", "TSF", "TSF", 
"TSF", "TSF", "TSF", "TSF"), scan_ref_no = c("NULL", "NULL", 
"NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"
)), .Names = c("prochi", "prescribed_date", "dataMonth", "item_code", 
"res_seqno", "quantity", "directions", "no_of_packs", "datasource", 
"scan_ref_no"), row.names = c(NA, 10L), class = "data.frame")

-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
feb09_267_presc_items_tsf.txt 

is the total file for y so if I use the command line with the total data for
y then I get the output specified in z

Thanks,
Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Can you show the output of dput(x_data) and dput(y_data).

On Fri, Mar 12, 2010 at 11:56 AM, Newbie19_02  wrote:
>
> Dear R users,
>
> I have two data frames that were read from text files as follows:
>
> x_data <- read.table("x.txt", header = TRUE, sep = "|", quote = "\"'",
>                dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
> nrows = 3864284,
>                skip = 0, check.names = TRUE,fill=TRUE,
>                strip.white = TRUE, blank.lines.skip = TRUE,
>                comment.char = "#", allowEscapes = FALSE, flush = FALSE,
>                fileEncoding = "", encoding = "unknown")
>
> x_data
>
> prochi prescribed_date dataMonth item_code res_seqno quantity directions
> CAO713      22/06/2001      NULL    842752      NULL       60        1/D
> CAO713      28/04/2000      NULL      7800      NULL     100G       A/TD
> CAO713      10/04/2000      NULL    842652      NULL       60        1/D
> CAO713      03/07/2000      NULL    842652      NULL       60        1/D
> CAO713      09/01/2001      NULL    842752      NULL       60        1/D
> CAO713      16/10/2001      NULL    842752      NULL       60        1/D
> CAO713      16/08/2001      NULL    842752      NULL       60        1/D
> CAO713      17/09/1993      NULL     39620      NULL      5ML        NIL
> CAO713      01/05/2001      NULL    842752      NULL       60        1/D
> CAO713      05/03/2001      NULL    842752      NULL       60        1/D
>
>
>
> y_data
>
>  item_code    name                              formulation_code  strength
> bnf_code
> 100              NEONACLEX K                    TABS      NULL    2.2.8
> 110                NEONACLEX                     TABS       5MG    2.2.1
> 50                   MESORB                         DRESS 10CMX10CM   20.3.1
> 160 ABSORBENT CELLULOSE MESO            DRESS 10CMX10CM   20.3.1
> 161 ABSORBENT CELLULOSE MESO            DRESS 10CMX15CM   20.3.1
> 164 ABSORBENT CELLULOSE MESO            DRESS 20CMX25CM   20.3.1
> 200                  SEPTRIN                        TABS     480MG    5.1.8
> 210          SEPTRIN PAED SF                    SUSP 240MG/5ML    5.1.8
> 212            SEPTRIN ADULT                     SUSP 480MG/5ML    5.1.8
> 220            SEPTRIN FORTE                     TABS     960MG    5.1.8
>  etc
>
>
> contains all the information for the item codes
> y was read in in the same way.
>
> I then used the following code:
>
> z  <- sqldf("select * from x left join y using (code)")
>
> when I use this on my real data I get an output:
>  prochi prescribed_date dataMonth item_code res_seqno quantity directions
> 1  CAO713      22/06/2001      NULL    842752      NULL       60
> 1/D
> 2  CAO713      28/04/2000      NULL      7800      NULL     100G
> A/TD
> 3  CAO713      10/04/2000      NULL    842652      NULL       60
> 1/D
> 4  CAO713      03/07/2000      NULL    842652      NULL       60
> 1/D
> 5  CAO713      09/01/2001      NULL    842752      NULL       60
> 1/D
> 6  CAO713      16/10/2001      NULL    842752      NULL       60
> 1/D
> 7  CAO713      16/08/2001      NULL    842752      NULL       60
> 1/D
> 8  CAO713      17/09/1993      NULL     39620      NULL      5ML
> NIL
> 9  CAO713      01/05/2001      NULL    842752      NULL       60
> 1/D
> 10 CAO713      05/03/2001      NULL    842752      NULL       60
> 1/D
>   no_of_packs datasource scan_ref_no         name formulation_code strength
> 1         NULL        TSF        NULL                           
> 2         NULL        TSF        NULL BETNOVATE RD             OINT   0.025%
> 3         NULL        TSF        NULL                           
> 4         NULL        TSF        NULL                           
> 5         NULL        TSF        NULL                           
> 6         NULL        TSF        NULL                           
> 7         NULL        TSF        NULL                           
> 8         NULL        TSF        NULL   GAMMABULIN              INJ    320MG
> 9         NULL        TSF        NULL                           
> 10        NULL        TSF        NULL                           
>   bnf_code
> 1      
> 2  13.4.1.2
> 3      
> 4      
> 5      
> 6      
> 7      
> 8      14.5
> 9      
> 10     
>
>
> There is absolutely no reason for there to be  anywhere as the
> information for both the tables is complete.
>
> Not sure what the problem is?
>
> Thanks,
> Natalie
> --
> View this message in context: 
> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>