Re: [R] merge function

2015-06-01 Thread Bert Gunter
You do not appear to understand what merge() does. Go through the worked
examples in ?merge so that you do.

FWIW, I would agree that the Help file is cryptic and difficult to
understand. Perhaps going through a tutorial on database join operations
might help.

Cheers,
Bert

Bert Gunter

Data is not information. Information is not knowledge. And knowledge is
certainly not wisdom.
   -- Clifford Stoll

On Mon, Jun 1, 2015 at 7:47 AM, carol white via R-help r-help@r-project.org
 wrote:

 I understood that by would take the intersection of names(x) and names(y),
 names(x) being the column names of x and names(y), column names of y.
 if x has 5 col and the col names of x are col1, col2... col5 and y has 3
 col and their names are col1, col2, col3, I thought that the merged data
 set will have 3 col, namely col1, col2, col3 but all 5 col, i.e. col1,
 col2... col5 are taken if nothing is specified for the by arg.
 Cheers,



  On Monday, June 1, 2015 4:32 PM, Michael Dewey 
 li...@dewey.myzen.co.uk wrote:




 On 01/06/2015 14:46, carol white via R-help wrote:
  Hi,By default the merge function should take the intersection of column
 names

   (if this is understood from by = intersect(names(x), names(y)),

 Dear Carol
 The by parameter specifies which columns are used to merge by. Did you
 understand it to be which columns are retained in the result?

 Just a hunch, and if not then you need to give us a toy example.



   but it takes all columns. How to specify the intersection of column
 names?
   Thanks
  Carol
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 Michael
 http://www.dewey.myzen.co.uk/home.html



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function

2015-06-01 Thread John Kane
Let me try this again. Here are the links I forgot. My apologies.
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 and http://adv-r.had.co.nz/Reproducibility.html

John Kane
Kingston ON Canada


 -Original Message-
 From: jrkrid...@inbox.com
 Sent: Mon, 1 Jun 2015 06:29:41 -0800
 To: wht_...@yahoo.com, r-help@r-project.org
 Subject: RE: [R] merge function
 
 As Burt says it is not exactly clear what you want but is something like
 this what you are looking for?
 
 dat1  -  data.frame(aa = c(a, b, c), bb = 1:3)
 dat2  -  data.frame(xx = c(b, c, d), yy = 3:1)
 merge(dat1, dat2, by.x = aa, by.y = xx)
 
 For further reference here are some suggestions about asking questions on
 the R-help list.  In particular it is very helpful if data is supplied in
 dput() form (See ?dput for details)
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: r-help@r-project.org
 Sent: Mon, 1 Jun 2015 13:46:15 + (UTC)
 To: r-help@r-project.org
 Subject: [R] merge function
 
 Hi,By default the merge function should take the intersection of column
 names (if this is understood from by = intersect(names(x), names(y)),
 but
 it takes all columns. How to specify the intersection of column names?
  Thanks
 Carol
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 Can't remember your password? Do you need a strong and secure password?
 Use Password manager! It stores your passwords  protects your account.
 Check it out at http://mysecurelogon.com/manager


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge function

2015-06-01 Thread carol white via R-help
I understood that by would take the intersection of names(x) and names(y), 
names(x) being the column names of x and names(y), column names of y.
if x has 5 col and the col names of x are col1, col2... col5 and y has 3 col 
and their names are col1, col2, col3, I thought that the merged data set will 
have 3 col, namely col1, col2, col3 but all 5 col, i.e. col1, col2... col5 are 
taken if nothing is specified for the by arg.
Cheers,
 


 On Monday, June 1, 2015 4:32 PM, Michael Dewey li...@dewey.myzen.co.uk 
wrote:
   

 

On 01/06/2015 14:46, carol white via R-help wrote:
 Hi,By default the merge function should take the intersection of column names

  (if this is understood from by = intersect(names(x), names(y)),

Dear Carol
The by parameter specifies which columns are used to merge by. Did you 
understand it to be which columns are retained in the result?

Just a hunch, and if not then you need to give us a toy example.



  but it takes all columns. How to specify the intersection of column names?
  Thanks
 Carol

     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Michael
http://www.dewey.myzen.co.uk/home.html


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge function

2015-06-01 Thread Bert Gunter
1. Please read and follow the posting guide.

2. Reproducible example? (... at least I don't understand what you mean)

3. Plain text, not HTML.

Cheers,
Bert

Bert Gunter

Data is not information. Information is not knowledge. And knowledge is
certainly not wisdom.
   -- Clifford Stoll

On Mon, Jun 1, 2015 at 6:46 AM, carol white via R-help r-help@r-project.org
 wrote:

 Hi,By default the merge function should take the intersection of column
 names (if this is understood from by = intersect(names(x), names(y)), but
 it takes all columns. How to specify the intersection of column names?
  Thanks
 Carol

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function

2015-06-01 Thread Michael Dewey



On 01/06/2015 14:46, carol white via R-help wrote:

Hi,By default the merge function should take the intersection of column names


 (if this is understood from by = intersect(names(x), names(y)),

Dear Carol
The by parameter specifies which columns are used to merge by. Did you 
understand it to be which columns are retained in the result?


Just a hunch, and if not then you need to give us a toy example.



 but it takes all columns. How to specify the intersection of column names?

  Thanks
Carol

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function

2015-06-01 Thread John Kane
As Burt says it is not exactly clear what you want but is something like this 
what you are looking for?

dat1  -  data.frame(aa = c(a, b, c), bb = 1:3)
dat2  -  data.frame(xx = c(b, c, d), yy = 3:1)
merge(dat1, dat2, by.x = aa, by.y = xx)

For further reference here are some suggestions about asking questions on the 
R-help list.  In particular it is very helpful if data is supplied in dput() 
form (See ?dput for details)

John Kane
Kingston ON Canada


 -Original Message-
 From: r-help@r-project.org
 Sent: Mon, 1 Jun 2015 13:46:15 + (UTC)
 To: r-help@r-project.org
 Subject: [R] merge function
 
 Hi,By default the merge function should take the intersection of column
 names (if this is understood from by = intersect(names(x), names(y)), but
 it takes all columns. How to specify the intersection of column names?
  Thanks
 Carol
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords  protects your account.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge function

2015-06-01 Thread John Kane
Exactly what I thought too the first time I read ?merge. R sometimes has its 
own approach.

John Kane
Kingston ON Canada


 -Original Message-
 From: r-help@r-project.org
 Sent: Mon, 1 Jun 2015 14:47:07 + (UTC)
 To: li...@dewey.myzen.co.uk, r-help@r-project.org
 Subject: Re: [R] merge function
 
 I understood that by would take the intersection of names(x) and
 names(y), names(x) being the column names of x and names(y), column names
 of y.
 if x has 5 col and the col names of x are col1, col2... col5 and y has 3
 col and their names are col1, col2, col3, I thought that the merged data
 set will have 3 col, namely col1, col2, col3 but all 5 col, i.e. col1,
 col2... col5 are taken if nothing is specified for the by arg.
 Cheers,
 
 
 
  On Monday, June 1, 2015 4:32 PM, Michael Dewey
 li...@dewey.myzen.co.uk wrote:
 
 
 
 
 On 01/06/2015 14:46, carol white via R-help wrote:
 Hi,By default the merge function should take the intersection of column
 names
 
   (if this is understood from by = intersect(names(x), names(y)),
 
 Dear Carol
 The by parameter specifies which columns are used to merge by. Did you
 understand it to be which columns are retained in the result?
 
 Just a hunch, and if not then you need to give us a toy example.
 
 
 
   but it takes all columns. How to specify the intersection of column
 names?
   Thanks
 Carol
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 --
 Michael
 http://www.dewey.myzen.co.uk/home.html
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks  orcas on your 
desktop!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge function to combine two tables

2013-03-14 Thread John Kane
Below , in line

John Kane
Kingston ON Canada


 -Original Message-
 From: michael.eisenr...@gmx.ch
 Sent: Thu, 14 Mar 2013 11:51:49 +0100
 To: r-help@r-project.org
 Subject: [R] merge function to combine two tables
 
 Dear R-help members
 
 I would be grateful if anyone could help me with the following problem:
 
 I would like to combine two matrices (Schmitt_15 and Schmitt_16, they are
 attached) which have a  species presence/absence x sampling plot
 structure. The aim would be to have in the end only one matrix which
 shows all existing species and their presence/absence on all the
 different plots.
 To do this I used the merge function in R.
 The problem is that my matrix in the end shows only 12 species (but there
 are in total about 100!). I don't know why.
 
 I used the following commands:
 
 
 Schmitt_15
 Schmitt_16
 output-merge(Schmitt_15,Schmitt_16,by=species)

#  you seem to be only picking out the common species in the two  data.frames

ncol(output)
  length(unique(output$species))
  Schmitt_15$species  %in% Schmitt_16$species
  
# This may do what you want. It means that you are taking every speices name 
found in either file. Is that what you want
  
newdat  -  merge(Schmitt_15,Schmitt_16, by=species, all = TRUE)



This gives me a merged file with 

# You seem to have missed a step here since there is no ab object in your code.
 write.table(ab,file=output.txt,sep=,)
 
 Can anyone help me?
 Thank you very much!
 Michael
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks  orcas on your 
desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function to combine two tables

2013-03-14 Thread jim holtman
Take a look at your data.  When I loaded what you attached, there were only
9 species that were in common across the two files:

 dim(s16)
[1] 226  83
 dim(s15)
[1] 96 41
 sum(s15$species %in% s16$species)
[1] 10
 sum(s16$species %in% s15$species)
[1] 10
 length(intersect(s16$species, s15$species))
[1] 9
 length(unique(s16$species))
[1] 173
 length(unique(s15$species))
[1] 90
 x - merge(s16, s15, by = 'species')
 dim(x)
[1]  12 123


so it is not surprising you got the result that you did.



On Thu, Mar 14, 2013 at 6:51 AM, Michael Eisenring michael.eisenr...@gmx.ch
 wrote:

 Dear R-help members

 I would be grateful if anyone could help me with the following problem:

 I would like to combine two matrices (Schmitt_15 and Schmitt_16, they are
 attached) which have a  species presence/absence x sampling plot structure.
 The aim would be to have in the end only one matrix which shows all
 existing species and their presence/absence on all the different plots.
 To do this I used the merge function in R.
 The problem is that my matrix in the end shows only 12 species (but there
 are in total about 100!). I don't know why.

 I used the following commands:


 Schmitt_15
 Schmitt_16
 output-merge(Schmitt_15,Schmitt_16,by=species)
 write.table(ab,file=output.txt,sep=,)

 Can anyone help me?
 Thank you very much!
 Michael




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function while obviating duplicate columns XXXX

2013-03-11 Thread Ista Zahn
On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com wrote:
 Hi everyone,

 I have the following call to the merge() function. How does one
 prevent duplicate columns in the resulting data frame that the 2
 parent data frames have in common but are not true key or by
 variables?


 data3-merge(data1,data2,by=id)
 data3

 id total.x total.y balance
 1 78  78 90
 2 91  91 63
 3 74  74 57
 4 89  89 58
 5 90  90 27


 In this example, total is not a true key or by variable that
 uniquely identifies rows suitable for matching purposes, but instead
 just happens to be common to both sets.

Well, which one do you want? Or do you want to exclude total from the result?


 In reality, I have hundreds for these in common variables, so I need
 a solution that is tractable for a large number of in common
 columns.

 Thanks!

 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function while obviating duplicate columns XXXX

2013-03-11 Thread Dan Abner
Ok, let's say I only want the common columns from data1. Is there a
succinct way of doing this for potentially hundreds of in common
columns?



On Mon, Mar 11, 2013 at 3:25 PM, Ista Zahn istaz...@gmail.com wrote:
 On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com wrote:
 Hi everyone,

 I have the following call to the merge() function. How does one
 prevent duplicate columns in the resulting data frame that the 2
 parent data frames have in common but are not true key or by
 variables?


 data3-merge(data1,data2,by=id)
 data3

 id total.x total.y balance
 1 78  78 90
 2 91  91 63
 3 74  74 57
 4 89  89 58
 5 90  90 27


 In this example, total is not a true key or by variable that
 uniquely identifies rows suitable for matching purposes, but instead
 just happens to be common to both sets.

 Well, which one do you want? Or do you want to exclude total from the result?


 In reality, I have hundreds for these in common variables, so I need
 a solution that is tractable for a large number of in common
 columns.

 Thanks!

 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function while obviating duplicate columns XXXX

2013-03-11 Thread Jeff Newmiller
intersect(names(data1),names(data2))
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Dan Abner dan.abne...@gmail.com wrote:

Ok, let's say I only want the common columns from data1. Is there a
succinct way of doing this for potentially hundreds of in common
columns?



On Mon, Mar 11, 2013 at 3:25 PM, Ista Zahn istaz...@gmail.com wrote:
 On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com
wrote:
 Hi everyone,

 I have the following call to the merge() function. How does one
 prevent duplicate columns in the resulting data frame that the 2
 parent data frames have in common but are not true key or by
 variables?


 data3-merge(data1,data2,by=id)
 data3

 id total.x total.y balance
 1 78  78 90
 2 91  91 63
 3 74  74 57
 4 89  89 58
 5 90  90 27


 In this example, total is not a true key or by variable that
 uniquely identifies rows suitable for matching purposes, but instead
 just happens to be common to both sets.

 Well, which one do you want? Or do you want to exclude total from the
result?


 In reality, I have hundreds for these in common variables, so I
need
 a solution that is tractable for a large number of in common
 columns.

 Thanks!

 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function while obviating duplicate columns XXXX

2013-03-11 Thread William Dunlap
You can use the set-oriented functions setdiff(), union(), and intersect().
E.g., setdiff(colnames(data2), colnames(data1)) gives the names of columns
of data2 that are not  names of columns of data1.  The following might be
what you want
merge(data1, data2[, c(id, setdiff(colnames(data2),colnames(data1)))], 
by=id)
You didn't give an example of the data nor the desired result so I made some up:
data1 - data.frame(id=c(1,1,2,3), Name=c(Joe,Joe,Ken,Leo))
data2 - data.frame(id=c(2,3), Name=c(Melody,Nell), Age=c(45,49))
merge(data1, data2, by=id)
 id Name.x Name.y Age
   1  2Ken Melody  45
   2  3Leo   Nell  49
merge(data1, data2[, c(id, setdiff(colnames(data2),colnames(data1)))], 
by=id)
 id Name Age
   1  2  Ken  45
   2  3  Leo  49

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Dan Abner
 Sent: Monday, March 11, 2013 2:02 PM
 To: Ista Zahn
 Cc: r-help@r-project.org
 Subject: Re: [R] merge function while obviating duplicate columns 
 
 Ok, let's say I only want the common columns from data1. Is there a
 succinct way of doing this for potentially hundreds of in common
 columns?
 
 
 
 On Mon, Mar 11, 2013 at 3:25 PM, Ista Zahn istaz...@gmail.com wrote:
  On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com wrote:
  Hi everyone,
 
  I have the following call to the merge() function. How does one
  prevent duplicate columns in the resulting data frame that the 2
  parent data frames have in common but are not true key or by
  variables?
 
 
  data3-merge(data1,data2,by=id)
  data3
 
  id total.x total.y balance
  1 78  78 90
  2 91  91 63
  3 74  74 57
  4 89  89 58
  5 90  90 27
 
 
  In this example, total is not a true key or by variable that
  uniquely identifies rows suitable for matching purposes, but instead
  just happens to be common to both sets.
 
  Well, which one do you want? Or do you want to exclude total from the 
  result?
 
 
  In reality, I have hundreds for these in common variables, so I need
  a solution that is tractable for a large number of in common
  columns.
 
  Thanks!
 
  Dan
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-27 Thread RHelpPlease
Hi there,
I've tried the noted solutions:

If you do `no - unlist(hrc_78_clm_no`, do you get a character vector 
of claim numbers you want to exclude? If so, then `subset(whatever, 
!CLAIM_NO %in% no)` should work.

I converted the CLAIM_NO list to a character, with

 hrc78_clmno_char - format(as.character(hrc78_clm_no))
 is.character(hrc78_clmno_char)
[1] TRUE

Then I applied your code (above), which didn't work.  Thanks though!

Thanks for the dput() help.  Here is truncated output of the list (its class
is data.frame, I call it a list for communication sake)  data.frame. 
Again, your help is most appreciated!

Goal: merge the list  data.frame together.  Output the data.frame, but with
rows where the CLAIM_NO variable between the list  data.frame *do not
match*.

*The List*
truncated_list - hrc78_clm_no[1:100,] #So you can see consistency in
previously-mentioned variables
truncated_list - structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 7002L,
9562L, 10463L, 12503L, 16195L, 
22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 
38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 
54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 
69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 
105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 
135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 
152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 
177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 
186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 
190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 
197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 
199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 
200473L, 200927L, 202407L), .Names = c(CLAIM_NO), class = data.frame))

*The (multi-column) data.frame, but greatly truncated*
truncated_dataframe - bestPartAreadmin[1:25, 1:4]
truncated_dataframe - structure(list(DESY_SORT_KEY = c(10193L,
10193L, 10193L, 
10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 
100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 
100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 
100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM =
structure(c(1368L, 
1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 
1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 
166L, 196L, 196L, 311L, 1363L), .Label = c(010001, 010006, 
010015, 010016, 010029, 010033, 010034, 010035, 010039, 
010040, 010046, 010049, 010083, 010092, 010108, 010131, 
010149, 01S001, 01S033, 01S046, 01S145, 020001, 020006, 
020012, 020017, 021306, 021311, 030002, 030006, 030007, 
030010, 030011, 030012, 030013, 030014, 030016, 030023, 
030024, 030030, 030033, 030036, 030037, 030038, 030043, 
030055, 030061, 030062, 030064, 030065, 030067, 030069, 
030078, 030083, 030085, 030087, 030088, 030089, 030092, 
030093, 030100, 030101, 030102, 030103, 030105, 030108, 
030110, 030111, 030114, 030115, 030117, 030118, 030119, 
030120, 030121, 030122, 030123, 030126, 030128, 031300, 
031305, 031311, 032000, 032001, 032002, 032006, 033025, 
033028, 033029, 033032, 033034, 033036, 034004, 034013, 
034020, 034024, 03S002, 03S006, 03S007, 03S016, 03S022, 
03S023, 03S089, 03T002, 03T055, 03T061, 03T069, 03T093, 
03T103, 03T114, 03T117, 03T126, 040004, 040007, 040010, 
040011, 040016, 040022, 040026, 040027, 040029, 040036, 
040041, 040047, 040055, 040062, 040072, 040080, 040084, 
040088, 040091, 040114, 040118, 040119, 043028, 044005, 
04S027, 04S084, 04T041, 04T062, 04T119, 050002, 050006, 
050007, 050008, 050009, 050013, 050014, 050016, 050017, 
050018, 050022, 050024, 050025, 050026, 050030, 050036, 
050038, 050039, 050040, 050042, 050043, 050045, 050046, 
050047, 050055, 050056, 050057, 050058, 050060, 050063, 
050069, 050070, 050071, 050073, 050075, 050076, 050077, 
050078, 050079, 050082, 050084, 050089, 050090, 050091, 
050093, 050099, 050100, 050101, 050102, 050103, 050104, 
050107, 050108, 050110, 050111, 050112, 050113, 050115, 
050116, 050118, 050121, 050122, 050124, 050125, 050126, 
050128, 050129, 050131, 050132, 050133, 050135, 050136, 
050137, 050138, 050139, 050140, 050145, 050146, 050149, 
050150, 050152, 050153, 050158, 050159, 050168, 050169, 
050174, 050179, 050180, 050188, 050191, 050193, 050195, 
050196, 050197, 050204, 050211, 050219, 050222, 050224, 
050225, 050226, 050228, 050230, 050231, 050232, 050234, 
050235, 050236, 050238, 050239, 050242, 050243, 050245, 
050248, 050254, 050257, 050261, 050262, 050264, 050272, 
050276, 050277, 050278, 050279, 050280, 050283, 050289, 
050290, 050291, 050292, 050295, 050296, 050298, 050300, 
050301, 050305, 050308, 050309, 050313, 050315, 050320, 
050324, 050327, 050329, 050334, 050335, 050336, 050342, 
050348, 050351, 050352, 050353, 050359, 050360, 050366, 
050367, 050373, 050376, 050378, 050380, 050382, 050385, 
050390, 050393, 

Re: [R] Merge function - Return NON matches

2012-04-27 Thread Petr PIKAL
Hi

If you used shorter names for your objects you will get probably more 
readable advice

Is this what you wanted?

truncated_dataframe[truncated_dataframe$CLAIM_NO %in% 
setdiff(truncated_dataframe$CLAIM_NO, truncated_list$CLAIM_NO),]

Regards
Petr

 
 Hi there,
 I've tried the noted solutions:
 
 If you do `no - unlist(hrc_78_clm_no`, do you get a character vector 
 of claim numbers you want to exclude? If so, then `subset(whatever, 
 !CLAIM_NO %in% no)` should work.
 
 I converted the CLAIM_NO list to a character, with
 
  hrc78_clmno_char - format(as.character(hrc78_clm_no))
  is.character(hrc78_clmno_char)
 [1] TRUE
 
 Then I applied your code (above), which didn't work.  Thanks though!
 
 Thanks for the dput() help.  Here is truncated output of the list (its 
class
 is data.frame, I call it a list for communication sake)  data.frame. 
 Again, your help is most appreciated!
 
 Goal: merge the list  data.frame together.  Output the data.frame, but 
with
 rows where the CLAIM_NO variable between the list  data.frame *do not
 match*.
 
 *The List*
 truncated_list - hrc78_clm_no[1:100,] #So you can see consistency in
 previously-mentioned variables
 truncated_list - structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 
7002L,
 9562L, 10463L, 12503L, 16195L, 
 22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 
 38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 
 54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 
 69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 
 105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 
 135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 
 152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 
 177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 
 186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 
 190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 
 197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 
 199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 
 200473L, 200927L, 202407L), .Names = c(CLAIM_NO), class = 
data.frame))
 
 *The (multi-column) data.frame, but greatly truncated*
 truncated_dataframe - bestPartAreadmin[1:25, 1:4]
 truncated_dataframe - structure(list(DESY_SORT_KEY = c(10193L,
 10193L, 10193L, 
 10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 
 100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 
 100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 
 100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM =
 structure(c(1368L, 
 1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 
 1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 
 166L, 196L, 196L, 311L, 1363L), .Label = c(010001, 010006, 
 010015, 010016, 010029, 010033, 010034, 010035, 010039, 
 010040, 010046, 010049, 010083, 010092, 010108, 010131, 
 010149, 01S001, 01S033, 01S046, 01S145, 020001, 020006, 
 020012, 020017, 021306, 021311, 030002, 030006, 030007, 
 030010, 030011, 030012, 030013, 030014, 030016, 030023, 
 030024, 030030, 030033, 030036, 030037, 030038, 030043, 
 030055, 030061, 030062, 030064, 030065, 030067, 030069, 
 030078, 030083, 030085, 030087, 030088, 030089, 030092, 
 030093, 030100, 030101, 030102, 030103, 030105, 030108, 
 030110, 030111, 030114, 030115, 030117, 030118, 030119, 
 030120, 030121, 030122, 030123, 030126, 030128, 031300, 
 031305, 031311, 032000, 032001, 032002, 032006, 033025, 
 033028, 033029, 033032, 033034, 033036, 034004, 034013, 
 034020, 034024, 03S002, 03S006, 03S007, 03S016, 03S022, 
 03S023, 03S089, 03T002, 03T055, 03T061, 03T069, 03T093, 
 03T103, 03T114, 03T117, 03T126, 040004, 040007, 040010, 
 040011, 040016, 040022, 040026, 040027, 040029, 040036, 
 040041, 040047, 040055, 040062, 040072, 040080, 040084, 
 040088, 040091, 040114, 040118, 040119, 043028, 044005, 
 04S027, 04S084, 04T041, 04T062, 04T119, 050002, 050006, 
 050007, 050008, 050009, 050013, 050014, 050016, 050017, 
 050018, 050022, 050024, 050025, 050026, 050030, 050036, 
 050038, 050039, 050040, 050042, 050043, 050045, 050046, 
 050047, 050055, 050056, 050057, 050058, 050060, 050063, 
 050069, 050070, 050071, 050073, 050075, 050076, 050077, 
 050078, 050079, 050082, 050084, 050089, 050090, 050091, 
 050093, 050099, 050100, 050101, 050102, 050103, 050104, 
 050107, 050108, 050110, 050111, 050112, 050113, 050115, 
 050116, 050118, 050121, 050122, 050124, 050125, 050126, 
 050128, 050129, 050131, 050132, 050133, 050135, 050136, 
 050137, 050138, 050139, 050140, 050145, 050146, 050149, 
 050150, 050152, 050153, 050158, 050159, 050168, 050169, 
 050174, 050179, 050180, 050188, 050191, 050193, 050195, 
 050196, 050197, 050204, 050211, 050219, 050222, 050224, 
 050225, 050226, 050228, 050230, 050231, 050232, 050234, 
 050235, 050236, 050238, 050239, 050242, 050243, 050245, 
 050248, 050254, 050257, 050261, 050262, 050264, 050272, 
 050276, 

Re: [R] Merge function - Return NON matches

2012-04-27 Thread RHelpPlease
Hi again,
Petr, your solution worked!

Thanks everyone for your input.  I'll look more into setdiff.

Cheers!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4593101.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread Steve Lianoglou
Hi,

To increase the chances of you getting help on this one, please give
example data (a small data.frame, a small list) that you are trying to
do this on, and also show the desired output. Whip these variables up
in your R workspace and paste the output of `dput` for each into your
follow up email.

It's hard (for me, anyways) to get what you're after ... I'm guessing
something that ends up looking like this will end up being one
solution:

subset(your.df, !CLAIM_NO %in% `something`)

but it's hard for me to tell from where I'm setting.

-steve


On Thu, Apr 26, 2012 at 3:33 PM, RHelpPlease rrum...@trghcsolutions.com wrote:
 Hi there,
 I wish to merge a common variable between a list and a data.frame  return
 rows via the data.frame where there is NO match.  Here are some details:

 The list, where the variable/col.name = CLAIM_NO
 CLAIM_NO
 20
 83
 1440
 4439
 7002
 ...

 dim(hrc78_clm_no)
 [1] 6678    1

 The data.frame, where there exists a variable with the same name, CLAIM_NO.
 dim(bestPartAreadmin)
 [1] 13068    93

 I wish to merge the two together  only return a data.frame where there is
 NO match in the CLAIM_NO between both files.

 I've read  tried code via the merge function.  If merge can do this,
 I'm missing something with the available options.

 I'm figuring something like:

 clm_no_nomatch - merge(hrc78_clm_no, bestPartAreadmin, by = CLAIM_NO,  ..
 .. ..)

 Your help is most appreciated!



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590755.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread RHelpPlease
Hi Steve,
Thanks for replying.  Here's a small piece of the data.frame:

 bestPartAreadmin[1:5,1:6]
  DESY_SORT_KEY   PRVDR_NUM   CLM_THRU_DT   CLAIM_NO  
NCH_NEAR_LINE_REC_IDEN_CD   NCH_CLM_TYPE_CD 
1 10193 290003  20090323   20   
  
V60
2 10193 290045  20091124   21   
  
V60
3 10193 29T003  20090401   22   
  
V60
4 10574 050017  20090527   83   
  
V60   
5 10574 050017  20090921   84   
  
V60   

There's 93 columns total in the data.frame, so these are the first six,
where you can see CLAIM_NO.

I wish for the resultant data.frame to look just like the data.frame
above, but values for CLAIM_NO (above) are those that differ/don't match the
corresponding CLAIM_NO values in the list (hrc78_clm_no).

Does this help?

Thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590810.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread RHelpPlease
Hi again,
I tried the sample code like this:

 merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) 
 dim(merged_clmno)
[1] 1306893

Note that:
 dim(bestPartAreadmin)
[1] 1306893

So, no change between the original data.frame (bestPartAreadmin)  the
(should be) less-rows merged_clmno data.frame.

Any further help is most appreciated!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread Sarah Goslee
You'd get better help if you actually did as Steve requested and
provided sample data (a reproducible example!) using dput().

But since you didn't:

 fakedata - data.frame(a = 1:5, b=11:15, c=c(1,1,1,2,2))
 fakedata
  a  b c
1 1 11 1
2 2 12 1
3 3 13 1
4 4 14 2
5 5 15 2
 notb - c(12, 14, 15)
 subset(fakedata, !b %in% notb)
  a  b c
1 1 11 1
3 3 13 1

Since you say that doesn't work for you, you absolutely have to
provide us with a reproducible example for anyone to be able to
diagnose your problem.

Sarah

On Thu, Apr 26, 2012 at 4:12 PM, RHelpPlease rrum...@trghcsolutions.com wrote:
 Hi again,
 I tried the sample code like this:

 merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no)
 dim(merged_clmno)
 [1] 13068    93

 Note that:
 dim(bestPartAreadmin)
 [1] 13068    93

 So, no change between the original data.frame (bestPartAreadmin)  the
 (should be) less-rows merged_clmno data.frame.

 Any further help is most appreciated!


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread Steve Lianoglou
Hi,

As Sarah reiterated -- it'd *really* be helpful if you give us data we
can actually work with.

That having been said:

On Thu, Apr 26, 2012 at 4:12 PM, RHelpPlease rrum...@trghcsolutions.com wrote:
 Hi again,
 I tried the sample code like this:

 merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no)
 dim(merged_clmno)
 [1] 13068    93

 Note that:
 dim(bestPartAreadmin)
 [1] 13068    93

 So, no change between the original data.frame (bestPartAreadmin)  the
 (should be) less-rows merged_clmno data.frame.

You're original email said you had a list that contains CLAIM_NO's
you want to exclude.

Is `hrc78_clm_no` this list -- does it only have claim_no's? passing
a list into the subset call after `%in%` won't work.

If you do `no - unlist(hrc_78_clm_no`, do you get a character vector
of claim numbers you want to exclude? If so, then `subset(whatever,
!CLAIM_NO %in% no)` should work.

HTH,
-steve



 Any further help is most appreciated!

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread RHelpPlease
Hi there,
Thanks for your responses.  I haven't used/heard of dput() before.  I'm
looking it up  understanding how it works.

Thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591003.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread MacQueen, Don
Assuming everything else is good, the all or all.x or all.y
arguments to merge() should do what I think you're asking for. You did
read the help page for merge, right?

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 4/26/12 12:33 PM, RHelpPlease rrum...@trghcsolutions.com wrote:

Hi there,
I wish to merge a common variable between a list and a data.frame  return
rows via the data.frame where there is NO match.  Here are some details:

The list, where the variable/col.name = CLAIM_NO
CLAIM_NO
20
83
1440
4439
7002
...

 dim(hrc78_clm_no)
[1] 66781

The data.frame, where there exists a variable with the same name,
CLAIM_NO.
 dim(bestPartAreadmin)
[1] 1306893

I wish to merge the two together  only return a data.frame where there is
NO match in the CLAIM_NO between both files.

I've read  tried code via the merge function.  If merge can do this,
I'm missing something with the available options.

I'm figuring something like:

clm_no_nomatch - merge(hrc78_clm_no, bestPartAreadmin, by = CLAIM_NO,
..
.. ..)

Your help is most appreciated!



--
View this message in context:
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p
4590755.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread chuck.01
# dput() example
# lets say you have data called y, like this:
 y
  sp1 sp2 sp3 sp4
d   0   0   0   0
e   0   0   0   0
f   0   0   0   0 
 
 # ok, so do this:
 dput(y)
structure(list(sp1 = c(0, 0, 0), sp2 = c(0, 0, 0), sp3 = c(0, 
0, 0), sp4 = c(0, 0, 0)), .Names = c(sp1, sp2, sp3, sp4
), row.names = c(d, e, f), class = data.frame)

# now copy and paste that into your R terminal to see why it is so nice.




RHelpPlease wrote
 
 Hi there,
 Thanks for your responses.  I haven't used/heard of dput() before.  I'm
 looking it up  understanding how it works.
 
 Thanks!
 


--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591189.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function

2011-07-01 Thread peter dalgaard

On Jul 1, 2011, at 06:48 , Jeff Newmiller wrote:

 You haven't provided a reproducible example.
 
 I do notice you are using T and F which are variables that can be redefined 
 (which is why TRUE and FALSE are preferred.

Also, if x and y really are vectors (I bet they're not, though), you'll get 
the cartesian product whatever all.x and all.y are, unless you specify by.x=x 
and by.y=y. I.e.,

 merge(1:3,2:4,all.y=F,all.x=T)
  x y
1 1 2
2 2 2
3 3 2
4 1 3
5 2 3
6 3 3
7 1 4
8 2 4
9 3 4

 merge(1:3,2:4,by.x=x,by.y=y)
  x
1 2
2 3

 merge(1:3,2:4,by.x=x,by.y=y, all.x=T)
  x
1 1
2 2
3 3

All just to point out the importance of actual examples. Mind reading is sort 
of fun and some correspondents on mailing lists get rather good at it, but it 
is more expedient to have a well-defined problem from the outset. 

-pd

 
 Downey, Patrick pdow...@urban.org wrote:
 
 Hello,
 
 I'm clearly confused about the merge function. In the following
 
 r - merge(x,y,all.x=T,all.y=F)
 
 my y vector has only unique values (no duplicates). So I don't understand
 how this can ever generate an r which is of greater length than x. 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function

2011-06-30 Thread Downey, Patrick
I was mistaken. There were duplicates in my y vector. Please ignore my
previous message. Sorry. 


-Original Message-
From: Downey, Patrick
Sent: Thu 6/30/2011 11:08 PM
To: r-help@r-project.org
Subject: merge function
 
Hello,

I'm clearly confused about the merge function. In the following

r - merge(x,y,all.x=T,all.y=F)

my y vector has only unique values (no duplicates). So I don't understand
how this can ever generate an r which is of greater length than x. 

I thought the default behavior was only matching rows are included, but
that using all.x=T included rows with unmatched x's as well. If all the y's
are unique, though, I don't understand how length(r)  length(x) is
possible. 

Some clarification would be great.

Thanks,
Mitch



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function

2011-06-30 Thread Jeff Newmiller
You haven't provided a reproducible example.

I do notice you are using T and F which are variables that can be redefined 
(which is why TRUE and FALSE are preferred.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Downey, Patrick pdow...@urban.org wrote:

Hello,

I'm clearly confused about the merge function. In the following

r - merge(x,y,all.x=T,all.y=F)

my y vector has only unique values (no duplicates). So I don't understand
how this can ever generate an r which is of greater length than x. 

I thought the default behavior was only matching rows are included, but
that using all.x=T included rows with unmatched x's as well. If all the y's
are unique, though, I don't understand how length(r)  length(x) is
possible. 

Some clarification would be great.

Thanks,
Mitch


[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function in R?

2010-08-16 Thread fishkbob

Thanks Chuck,

I was trying to implement something more complicated than what I had to and
after finding the reduce() function in bioconductor, everything went
smoothly. 

Thanks again

-- 
View this message in context: 
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2327133.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function in R?

2010-08-13 Thread JesperHybel

I think it would be helpful if you could clarify youre question - do you want
distinct sets - maybe use 

unique()

but why (5,20) when its (5,10) in the row in youre example? What criteria do
you want the function to select the sets by and what kind of output do you
need? 

Maybe it's just me who dosn't get the question..sr
-- 
View this message in context: 
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324844.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function in R?

2010-08-13 Thread fishkbob

I too think I worded it incorrectly...

so the second two columns of the matrix are the start and end of an interval
however, because some of the intervals overlap, I want to limit the number
of intervals I have to deal with.

So therefore,
(5 10)should merge with(7 18)   making(5 18)
and then (518)   should merge with (1620)   giving   (520)
whereas  (1 4) has no overlap with any other interval and is therefore
left on its own

Ideal output would just be a collapsing of the matrix
sample   start end
#  5   20
#  14

I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives me a
c(1:4,5:20)
However, I have to do this on a very large dataset and the numbers are more
like
c(100542:100782,598322:598821,...)

any help would be appreciated
thanks 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function in R?

2010-08-13 Thread David Winsemius
Neither you nor your responder have continued the eamil chain very  
well so let me put things back together:

on  Aug 13, 2010; 03:54pm fishkbob wrote subj = merge function in R?

So I have a bunch of c(start,end) points and want to consolidate  
them into as few c(start,end) as possible.


For example:
sample   startend
A  5   10
B  7   18
C  14
D  16  20

I'd want the function to return the two distinct sets (1,4) and  
(5,20)


Is there an R function that already does this?
or should I write my own? (how would I go about that?)


In an effort to be be helpful but not copying the prior message on  
Aug 13, 2010; 06:46pm  JesperHybel wrote:


I think it would be helpful if you could clarify youre question -  
do you want distinct sets - maybe use


unique()

but why (5,20) when its (5,10) in the row in youre example? What  
criteria do you want the function to select the sets by and what  
kind of output do you need?


Maybe it's just me who dosn't get the question..sr


On Aug 13, 2010, at 7:01 PM, fishkbob wrote:



I too think I worded it incorrectly...

so the second two columns of the matrix are the start and end of an  
interval
however, because some of the intervals overlap, I want to limit the  
number

of intervals I have to deal with.

So therefore,
(5 10)should merge with(7 18)   making(5 18)
and then (518)   should merge with (1620)   giving   (520)
whereas  (1 4) has no overlap with any other interval and is  
therefore

left on its own

Ideal output would just be a collapsing of the matrix
sample   start end
#  5   20
#  14

I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives  
me a

c(1:4,5:20)
However, I have to do this on a very large dataset and the numbers  
are more

like
c(100542:100782,598322:598821,...)

any help would be appreciated
thanks
--
View this message in context: 
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html
Sent from the R help mailing list archive at Nabble.com.


Nabble is where I saw all of this, but Nabble is not r-help:

I suggest you sort your rows by the start variable and then examine  
where the breaks would remain by looking at the prior values of end:


 dd - rd.txt(sample   startend
+ A  5   10
+ B  7   18
+ C  14
+ D  16  20)
 dd[order(dd$start), ]
  sample start end
3  C 1   4
1  A 5  10
2  B 7  18
4  D16  20
 ndd - dd[order(dd$start), ]
 ndd$inprior - c(NA, ndd[1:nrow(ndd)-1,3] = ndd[2:nrow(ndd),2] )
 ndd
  sample start end inprior
3  C 1   4  NA
1  A 5  10   FALSE
2  B 7  18TRUE
4  D16  20TRUE

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge function in R?

2010-08-13 Thread Charles C. Berry

On Fri, 13 Aug 2010, fishkbob wrote:



So I have a bunch of c(start,end) points and want to consolidate them into as
few c(start,end) as possible.

For example:
sample   startend
A  5   10
B  7   18
C  14
D  16  20

I'd want the function to return the two distinct sets (1,4) and (5,20)

Is there an R function that already does this?


Yes.

See the reduce() function in the IRanges package on BioConductor

See pages 11-12 of

http://www.bioconductor.org/packages/2.6/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf


HTH,

Chuck



or should I write my own? (how would I go about that?)
--
View this message in context: 
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324684.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.