Re: [R] merge function
You do not appear to understand what merge() does. Go through the worked examples in ?merge so that you do. FWIW, I would agree that the Help file is cryptic and difficult to understand. Perhaps going through a tutorial on database join operations might help. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Mon, Jun 1, 2015 at 7:47 AM, carol white via R-help r-help@r-project.org wrote: I understood that by would take the intersection of names(x) and names(y), names(x) being the column names of x and names(y), column names of y. if x has 5 col and the col names of x are col1, col2... col5 and y has 3 col and their names are col1, col2, col3, I thought that the merged data set will have 3 col, namely col1, col2, col3 but all 5 col, i.e. col1, col2... col5 are taken if nothing is specified for the by arg. Cheers, On Monday, June 1, 2015 4:32 PM, Michael Dewey li...@dewey.myzen.co.uk wrote: On 01/06/2015 14:46, carol white via R-help wrote: Hi,By default the merge function should take the intersection of column names (if this is understood from by = intersect(names(x), names(y)), Dear Carol The by parameter specifies which columns are used to merge by. Did you understand it to be which columns are retained in the result? Just a hunch, and if not then you need to give us a toy example. but it takes all columns. How to specify the intersection of column names? Thanks Carol [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
Let me try this again. Here are the links I forgot. My apologies. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html John Kane Kingston ON Canada -Original Message- From: jrkrid...@inbox.com Sent: Mon, 1 Jun 2015 06:29:41 -0800 To: wht_...@yahoo.com, r-help@r-project.org Subject: RE: [R] merge function As Burt says it is not exactly clear what you want but is something like this what you are looking for? dat1 - data.frame(aa = c(a, b, c), bb = 1:3) dat2 - data.frame(xx = c(b, c, d), yy = 3:1) merge(dat1, dat2, by.x = aa, by.y = xx) For further reference here are some suggestions about asking questions on the R-help list. In particular it is very helpful if data is supplied in dput() form (See ?dput for details) John Kane Kingston ON Canada -Original Message- From: r-help@r-project.org Sent: Mon, 1 Jun 2015 13:46:15 + (UTC) To: r-help@r-project.org Subject: [R] merge function Hi,By default the merge function should take the intersection of column names (if this is understood from by = intersect(names(x), names(y)), but it takes all columns. How to specify the intersection of column names? Thanks Carol [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Can't remember your password? Do you need a strong and secure password? Use Password manager! It stores your passwords protects your account. Check it out at http://mysecurelogon.com/manager FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
I understood that by would take the intersection of names(x) and names(y), names(x) being the column names of x and names(y), column names of y. if x has 5 col and the col names of x are col1, col2... col5 and y has 3 col and their names are col1, col2, col3, I thought that the merged data set will have 3 col, namely col1, col2, col3 but all 5 col, i.e. col1, col2... col5 are taken if nothing is specified for the by arg. Cheers, On Monday, June 1, 2015 4:32 PM, Michael Dewey li...@dewey.myzen.co.uk wrote: On 01/06/2015 14:46, carol white via R-help wrote: Hi,By default the merge function should take the intersection of column names (if this is understood from by = intersect(names(x), names(y)), Dear Carol The by parameter specifies which columns are used to merge by. Did you understand it to be which columns are retained in the result? Just a hunch, and if not then you need to give us a toy example. but it takes all columns. How to specify the intersection of column names? Thanks Carol [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
1. Please read and follow the posting guide. 2. Reproducible example? (... at least I don't understand what you mean) 3. Plain text, not HTML. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Mon, Jun 1, 2015 at 6:46 AM, carol white via R-help r-help@r-project.org wrote: Hi,By default the merge function should take the intersection of column names (if this is understood from by = intersect(names(x), names(y)), but it takes all columns. How to specify the intersection of column names? Thanks Carol [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
On 01/06/2015 14:46, carol white via R-help wrote: Hi,By default the merge function should take the intersection of column names (if this is understood from by = intersect(names(x), names(y)), Dear Carol The by parameter specifies which columns are used to merge by. Did you understand it to be which columns are retained in the result? Just a hunch, and if not then you need to give us a toy example. but it takes all columns. How to specify the intersection of column names? Thanks Carol [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
As Burt says it is not exactly clear what you want but is something like this what you are looking for? dat1 - data.frame(aa = c(a, b, c), bb = 1:3) dat2 - data.frame(xx = c(b, c, d), yy = 3:1) merge(dat1, dat2, by.x = aa, by.y = xx) For further reference here are some suggestions about asking questions on the R-help list. In particular it is very helpful if data is supplied in dput() form (See ?dput for details) John Kane Kingston ON Canada -Original Message- From: r-help@r-project.org Sent: Mon, 1 Jun 2015 13:46:15 + (UTC) To: r-help@r-project.org Subject: [R] merge function Hi,By default the merge function should take the intersection of column names (if this is understood from by = intersect(names(x), names(y)), but it takes all columns. How to specify the intersection of column names? Thanks Carol [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Can't remember your password? Do you need a strong and secure password? Use Password manager! It stores your passwords protects your account. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
Exactly what I thought too the first time I read ?merge. R sometimes has its own approach. John Kane Kingston ON Canada -Original Message- From: r-help@r-project.org Sent: Mon, 1 Jun 2015 14:47:07 + (UTC) To: li...@dewey.myzen.co.uk, r-help@r-project.org Subject: Re: [R] merge function I understood that by would take the intersection of names(x) and names(y), names(x) being the column names of x and names(y), column names of y. if x has 5 col and the col names of x are col1, col2... col5 and y has 3 col and their names are col1, col2, col3, I thought that the merged data set will have 3 col, namely col1, col2, col3 but all 5 col, i.e. col1, col2... col5 are taken if nothing is specified for the by arg. Cheers, On Monday, June 1, 2015 4:32 PM, Michael Dewey li...@dewey.myzen.co.uk wrote: On 01/06/2015 14:46, carol white via R-help wrote: Hi,By default the merge function should take the intersection of column names (if this is understood from by = intersect(names(x), names(y)), Dear Carol The by parameter specifies which columns are used to merge by. Did you understand it to be which columns are retained in the result? Just a hunch, and if not then you need to give us a toy example. but it takes all columns. How to specify the intersection of column names? Thanks Carol [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function to combine two tables
Below , in line John Kane Kingston ON Canada -Original Message- From: michael.eisenr...@gmx.ch Sent: Thu, 14 Mar 2013 11:51:49 +0100 To: r-help@r-project.org Subject: [R] merge function to combine two tables Dear R-help members I would be grateful if anyone could help me with the following problem: I would like to combine two matrices (Schmitt_15 and Schmitt_16, they are attached) which have a species presence/absence x sampling plot structure. The aim would be to have in the end only one matrix which shows all existing species and their presence/absence on all the different plots. To do this I used the merge function in R. The problem is that my matrix in the end shows only 12 species (but there are in total about 100!). I don't know why. I used the following commands: Schmitt_15 Schmitt_16 output-merge(Schmitt_15,Schmitt_16,by=species) # you seem to be only picking out the common species in the two data.frames ncol(output) length(unique(output$species)) Schmitt_15$species %in% Schmitt_16$species # This may do what you want. It means that you are taking every speices name found in either file. Is that what you want newdat - merge(Schmitt_15,Schmitt_16, by=species, all = TRUE) This gives me a merged file with # You seem to have missed a step here since there is no ab object in your code. write.table(ab,file=output.txt,sep=,) Can anyone help me? Thank you very much! Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function to combine two tables
Take a look at your data. When I loaded what you attached, there were only 9 species that were in common across the two files: dim(s16) [1] 226 83 dim(s15) [1] 96 41 sum(s15$species %in% s16$species) [1] 10 sum(s16$species %in% s15$species) [1] 10 length(intersect(s16$species, s15$species)) [1] 9 length(unique(s16$species)) [1] 173 length(unique(s15$species)) [1] 90 x - merge(s16, s15, by = 'species') dim(x) [1] 12 123 so it is not surprising you got the result that you did. On Thu, Mar 14, 2013 at 6:51 AM, Michael Eisenring michael.eisenr...@gmx.ch wrote: Dear R-help members I would be grateful if anyone could help me with the following problem: I would like to combine two matrices (Schmitt_15 and Schmitt_16, they are attached) which have a species presence/absence x sampling plot structure. The aim would be to have in the end only one matrix which shows all existing species and their presence/absence on all the different plots. To do this I used the merge function in R. The problem is that my matrix in the end shows only 12 species (but there are in total about 100!). I don't know why. I used the following commands: Schmitt_15 Schmitt_16 output-merge(Schmitt_15,Schmitt_16,by=species) write.table(ab,file=output.txt,sep=,) Can anyone help me? Thank you very much! Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function while obviating duplicate columns XXXX
On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com wrote: Hi everyone, I have the following call to the merge() function. How does one prevent duplicate columns in the resulting data frame that the 2 parent data frames have in common but are not true key or by variables? data3-merge(data1,data2,by=id) data3 id total.x total.y balance 1 78 78 90 2 91 91 63 3 74 74 57 4 89 89 58 5 90 90 27 In this example, total is not a true key or by variable that uniquely identifies rows suitable for matching purposes, but instead just happens to be common to both sets. Well, which one do you want? Or do you want to exclude total from the result? In reality, I have hundreds for these in common variables, so I need a solution that is tractable for a large number of in common columns. Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function while obviating duplicate columns XXXX
Ok, let's say I only want the common columns from data1. Is there a succinct way of doing this for potentially hundreds of in common columns? On Mon, Mar 11, 2013 at 3:25 PM, Ista Zahn istaz...@gmail.com wrote: On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com wrote: Hi everyone, I have the following call to the merge() function. How does one prevent duplicate columns in the resulting data frame that the 2 parent data frames have in common but are not true key or by variables? data3-merge(data1,data2,by=id) data3 id total.x total.y balance 1 78 78 90 2 91 91 63 3 74 74 57 4 89 89 58 5 90 90 27 In this example, total is not a true key or by variable that uniquely identifies rows suitable for matching purposes, but instead just happens to be common to both sets. Well, which one do you want? Or do you want to exclude total from the result? In reality, I have hundreds for these in common variables, so I need a solution that is tractable for a large number of in common columns. Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function while obviating duplicate columns XXXX
intersect(names(data1),names(data2)) --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Dan Abner dan.abne...@gmail.com wrote: Ok, let's say I only want the common columns from data1. Is there a succinct way of doing this for potentially hundreds of in common columns? On Mon, Mar 11, 2013 at 3:25 PM, Ista Zahn istaz...@gmail.com wrote: On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com wrote: Hi everyone, I have the following call to the merge() function. How does one prevent duplicate columns in the resulting data frame that the 2 parent data frames have in common but are not true key or by variables? data3-merge(data1,data2,by=id) data3 id total.x total.y balance 1 78 78 90 2 91 91 63 3 74 74 57 4 89 89 58 5 90 90 27 In this example, total is not a true key or by variable that uniquely identifies rows suitable for matching purposes, but instead just happens to be common to both sets. Well, which one do you want? Or do you want to exclude total from the result? In reality, I have hundreds for these in common variables, so I need a solution that is tractable for a large number of in common columns. Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function while obviating duplicate columns XXXX
You can use the set-oriented functions setdiff(), union(), and intersect(). E.g., setdiff(colnames(data2), colnames(data1)) gives the names of columns of data2 that are not names of columns of data1. The following might be what you want merge(data1, data2[, c(id, setdiff(colnames(data2),colnames(data1)))], by=id) You didn't give an example of the data nor the desired result so I made some up: data1 - data.frame(id=c(1,1,2,3), Name=c(Joe,Joe,Ken,Leo)) data2 - data.frame(id=c(2,3), Name=c(Melody,Nell), Age=c(45,49)) merge(data1, data2, by=id) id Name.x Name.y Age 1 2Ken Melody 45 2 3Leo Nell 49 merge(data1, data2[, c(id, setdiff(colnames(data2),colnames(data1)))], by=id) id Name Age 1 2 Ken 45 2 3 Leo 49 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dan Abner Sent: Monday, March 11, 2013 2:02 PM To: Ista Zahn Cc: r-help@r-project.org Subject: Re: [R] merge function while obviating duplicate columns Ok, let's say I only want the common columns from data1. Is there a succinct way of doing this for potentially hundreds of in common columns? On Mon, Mar 11, 2013 at 3:25 PM, Ista Zahn istaz...@gmail.com wrote: On Mon, Mar 11, 2013 at 3:17 PM, Dan Abner dan.abne...@gmail.com wrote: Hi everyone, I have the following call to the merge() function. How does one prevent duplicate columns in the resulting data frame that the 2 parent data frames have in common but are not true key or by variables? data3-merge(data1,data2,by=id) data3 id total.x total.y balance 1 78 78 90 2 91 91 63 3 74 74 57 4 89 89 58 5 90 90 27 In this example, total is not a true key or by variable that uniquely identifies rows suitable for matching purposes, but instead just happens to be common to both sets. Well, which one do you want? Or do you want to exclude total from the result? In reality, I have hundreds for these in common variables, so I need a solution that is tractable for a large number of in common columns. Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi there, I've tried the noted solutions: If you do `no - unlist(hrc_78_clm_no`, do you get a character vector of claim numbers you want to exclude? If so, then `subset(whatever, !CLAIM_NO %in% no)` should work. I converted the CLAIM_NO list to a character, with hrc78_clmno_char - format(as.character(hrc78_clm_no)) is.character(hrc78_clmno_char) [1] TRUE Then I applied your code (above), which didn't work. Thanks though! Thanks for the dput() help. Here is truncated output of the list (its class is data.frame, I call it a list for communication sake) data.frame. Again, your help is most appreciated! Goal: merge the list data.frame together. Output the data.frame, but with rows where the CLAIM_NO variable between the list data.frame *do not match*. *The List* truncated_list - hrc78_clm_no[1:100,] #So you can see consistency in previously-mentioned variables truncated_list - structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 7002L, 9562L, 10463L, 12503L, 16195L, 22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 200473L, 200927L, 202407L), .Names = c(CLAIM_NO), class = data.frame)) *The (multi-column) data.frame, but greatly truncated* truncated_dataframe - bestPartAreadmin[1:25, 1:4] truncated_dataframe - structure(list(DESY_SORT_KEY = c(10193L, 10193L, 10193L, 10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM = structure(c(1368L, 1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 166L, 196L, 196L, 311L, 1363L), .Label = c(010001, 010006, 010015, 010016, 010029, 010033, 010034, 010035, 010039, 010040, 010046, 010049, 010083, 010092, 010108, 010131, 010149, 01S001, 01S033, 01S046, 01S145, 020001, 020006, 020012, 020017, 021306, 021311, 030002, 030006, 030007, 030010, 030011, 030012, 030013, 030014, 030016, 030023, 030024, 030030, 030033, 030036, 030037, 030038, 030043, 030055, 030061, 030062, 030064, 030065, 030067, 030069, 030078, 030083, 030085, 030087, 030088, 030089, 030092, 030093, 030100, 030101, 030102, 030103, 030105, 030108, 030110, 030111, 030114, 030115, 030117, 030118, 030119, 030120, 030121, 030122, 030123, 030126, 030128, 031300, 031305, 031311, 032000, 032001, 032002, 032006, 033025, 033028, 033029, 033032, 033034, 033036, 034004, 034013, 034020, 034024, 03S002, 03S006, 03S007, 03S016, 03S022, 03S023, 03S089, 03T002, 03T055, 03T061, 03T069, 03T093, 03T103, 03T114, 03T117, 03T126, 040004, 040007, 040010, 040011, 040016, 040022, 040026, 040027, 040029, 040036, 040041, 040047, 040055, 040062, 040072, 040080, 040084, 040088, 040091, 040114, 040118, 040119, 043028, 044005, 04S027, 04S084, 04T041, 04T062, 04T119, 050002, 050006, 050007, 050008, 050009, 050013, 050014, 050016, 050017, 050018, 050022, 050024, 050025, 050026, 050030, 050036, 050038, 050039, 050040, 050042, 050043, 050045, 050046, 050047, 050055, 050056, 050057, 050058, 050060, 050063, 050069, 050070, 050071, 050073, 050075, 050076, 050077, 050078, 050079, 050082, 050084, 050089, 050090, 050091, 050093, 050099, 050100, 050101, 050102, 050103, 050104, 050107, 050108, 050110, 050111, 050112, 050113, 050115, 050116, 050118, 050121, 050122, 050124, 050125, 050126, 050128, 050129, 050131, 050132, 050133, 050135, 050136, 050137, 050138, 050139, 050140, 050145, 050146, 050149, 050150, 050152, 050153, 050158, 050159, 050168, 050169, 050174, 050179, 050180, 050188, 050191, 050193, 050195, 050196, 050197, 050204, 050211, 050219, 050222, 050224, 050225, 050226, 050228, 050230, 050231, 050232, 050234, 050235, 050236, 050238, 050239, 050242, 050243, 050245, 050248, 050254, 050257, 050261, 050262, 050264, 050272, 050276, 050277, 050278, 050279, 050280, 050283, 050289, 050290, 050291, 050292, 050295, 050296, 050298, 050300, 050301, 050305, 050308, 050309, 050313, 050315, 050320, 050324, 050327, 050329, 050334, 050335, 050336, 050342, 050348, 050351, 050352, 050353, 050359, 050360, 050366, 050367, 050373, 050376, 050378, 050380, 050382, 050385, 050390, 050393,
Re: [R] Merge function - Return NON matches
Hi If you used shorter names for your objects you will get probably more readable advice Is this what you wanted? truncated_dataframe[truncated_dataframe$CLAIM_NO %in% setdiff(truncated_dataframe$CLAIM_NO, truncated_list$CLAIM_NO),] Regards Petr Hi there, I've tried the noted solutions: If you do `no - unlist(hrc_78_clm_no`, do you get a character vector of claim numbers you want to exclude? If so, then `subset(whatever, !CLAIM_NO %in% no)` should work. I converted the CLAIM_NO list to a character, with hrc78_clmno_char - format(as.character(hrc78_clm_no)) is.character(hrc78_clmno_char) [1] TRUE Then I applied your code (above), which didn't work. Thanks though! Thanks for the dput() help. Here is truncated output of the list (its class is data.frame, I call it a list for communication sake) data.frame. Again, your help is most appreciated! Goal: merge the list data.frame together. Output the data.frame, but with rows where the CLAIM_NO variable between the list data.frame *do not match*. *The List* truncated_list - hrc78_clm_no[1:100,] #So you can see consistency in previously-mentioned variables truncated_list - structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 7002L, 9562L, 10463L, 12503L, 16195L, 22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 200473L, 200927L, 202407L), .Names = c(CLAIM_NO), class = data.frame)) *The (multi-column) data.frame, but greatly truncated* truncated_dataframe - bestPartAreadmin[1:25, 1:4] truncated_dataframe - structure(list(DESY_SORT_KEY = c(10193L, 10193L, 10193L, 10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM = structure(c(1368L, 1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 166L, 196L, 196L, 311L, 1363L), .Label = c(010001, 010006, 010015, 010016, 010029, 010033, 010034, 010035, 010039, 010040, 010046, 010049, 010083, 010092, 010108, 010131, 010149, 01S001, 01S033, 01S046, 01S145, 020001, 020006, 020012, 020017, 021306, 021311, 030002, 030006, 030007, 030010, 030011, 030012, 030013, 030014, 030016, 030023, 030024, 030030, 030033, 030036, 030037, 030038, 030043, 030055, 030061, 030062, 030064, 030065, 030067, 030069, 030078, 030083, 030085, 030087, 030088, 030089, 030092, 030093, 030100, 030101, 030102, 030103, 030105, 030108, 030110, 030111, 030114, 030115, 030117, 030118, 030119, 030120, 030121, 030122, 030123, 030126, 030128, 031300, 031305, 031311, 032000, 032001, 032002, 032006, 033025, 033028, 033029, 033032, 033034, 033036, 034004, 034013, 034020, 034024, 03S002, 03S006, 03S007, 03S016, 03S022, 03S023, 03S089, 03T002, 03T055, 03T061, 03T069, 03T093, 03T103, 03T114, 03T117, 03T126, 040004, 040007, 040010, 040011, 040016, 040022, 040026, 040027, 040029, 040036, 040041, 040047, 040055, 040062, 040072, 040080, 040084, 040088, 040091, 040114, 040118, 040119, 043028, 044005, 04S027, 04S084, 04T041, 04T062, 04T119, 050002, 050006, 050007, 050008, 050009, 050013, 050014, 050016, 050017, 050018, 050022, 050024, 050025, 050026, 050030, 050036, 050038, 050039, 050040, 050042, 050043, 050045, 050046, 050047, 050055, 050056, 050057, 050058, 050060, 050063, 050069, 050070, 050071, 050073, 050075, 050076, 050077, 050078, 050079, 050082, 050084, 050089, 050090, 050091, 050093, 050099, 050100, 050101, 050102, 050103, 050104, 050107, 050108, 050110, 050111, 050112, 050113, 050115, 050116, 050118, 050121, 050122, 050124, 050125, 050126, 050128, 050129, 050131, 050132, 050133, 050135, 050136, 050137, 050138, 050139, 050140, 050145, 050146, 050149, 050150, 050152, 050153, 050158, 050159, 050168, 050169, 050174, 050179, 050180, 050188, 050191, 050193, 050195, 050196, 050197, 050204, 050211, 050219, 050222, 050224, 050225, 050226, 050228, 050230, 050231, 050232, 050234, 050235, 050236, 050238, 050239, 050242, 050243, 050245, 050248, 050254, 050257, 050261, 050262, 050264, 050272, 050276,
Re: [R] Merge function - Return NON matches
Hi again, Petr, your solution worked! Thanks everyone for your input. I'll look more into setdiff. Cheers! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4593101.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi, To increase the chances of you getting help on this one, please give example data (a small data.frame, a small list) that you are trying to do this on, and also show the desired output. Whip these variables up in your R workspace and paste the output of `dput` for each into your follow up email. It's hard (for me, anyways) to get what you're after ... I'm guessing something that ends up looking like this will end up being one solution: subset(your.df, !CLAIM_NO %in% `something`) but it's hard for me to tell from where I'm setting. -steve On Thu, Apr 26, 2012 at 3:33 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi there, I wish to merge a common variable between a list and a data.frame return rows via the data.frame where there is NO match. Here are some details: The list, where the variable/col.name = CLAIM_NO CLAIM_NO 20 83 1440 4439 7002 ... dim(hrc78_clm_no) [1] 6678 1 The data.frame, where there exists a variable with the same name, CLAIM_NO. dim(bestPartAreadmin) [1] 13068 93 I wish to merge the two together only return a data.frame where there is NO match in the CLAIM_NO between both files. I've read tried code via the merge function. If merge can do this, I'm missing something with the available options. I'm figuring something like: clm_no_nomatch - merge(hrc78_clm_no, bestPartAreadmin, by = CLAIM_NO, .. .. ..) Your help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi Steve, Thanks for replying. Here's a small piece of the data.frame: bestPartAreadmin[1:5,1:6] DESY_SORT_KEY PRVDR_NUM CLM_THRU_DT CLAIM_NO NCH_NEAR_LINE_REC_IDEN_CD NCH_CLM_TYPE_CD 1 10193 290003 20090323 20 V60 2 10193 290045 20091124 21 V60 3 10193 29T003 20090401 22 V60 4 10574 050017 20090527 83 V60 5 10574 050017 20090921 84 V60 There's 93 columns total in the data.frame, so these are the first six, where you can see CLAIM_NO. I wish for the resultant data.frame to look just like the data.frame above, but values for CLAIM_NO (above) are those that differ/don't match the corresponding CLAIM_NO values in the list (hrc78_clm_no). Does this help? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590810.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi again, I tried the sample code like this: merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) dim(merged_clmno) [1] 1306893 Note that: dim(bestPartAreadmin) [1] 1306893 So, no change between the original data.frame (bestPartAreadmin) the (should be) less-rows merged_clmno data.frame. Any further help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
You'd get better help if you actually did as Steve requested and provided sample data (a reproducible example!) using dput(). But since you didn't: fakedata - data.frame(a = 1:5, b=11:15, c=c(1,1,1,2,2)) fakedata a b c 1 1 11 1 2 2 12 1 3 3 13 1 4 4 14 2 5 5 15 2 notb - c(12, 14, 15) subset(fakedata, !b %in% notb) a b c 1 1 11 1 3 3 13 1 Since you say that doesn't work for you, you absolutely have to provide us with a reproducible example for anyone to be able to diagnose your problem. Sarah On Thu, Apr 26, 2012 at 4:12 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi again, I tried the sample code like this: merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) dim(merged_clmno) [1] 13068 93 Note that: dim(bestPartAreadmin) [1] 13068 93 So, no change between the original data.frame (bestPartAreadmin) the (should be) less-rows merged_clmno data.frame. Any further help is most appreciated! -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi, As Sarah reiterated -- it'd *really* be helpful if you give us data we can actually work with. That having been said: On Thu, Apr 26, 2012 at 4:12 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi again, I tried the sample code like this: merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) dim(merged_clmno) [1] 13068 93 Note that: dim(bestPartAreadmin) [1] 13068 93 So, no change between the original data.frame (bestPartAreadmin) the (should be) less-rows merged_clmno data.frame. You're original email said you had a list that contains CLAIM_NO's you want to exclude. Is `hrc78_clm_no` this list -- does it only have claim_no's? passing a list into the subset call after `%in%` won't work. If you do `no - unlist(hrc_78_clm_no`, do you get a character vector of claim numbers you want to exclude? If so, then `subset(whatever, !CLAIM_NO %in% no)` should work. HTH, -steve Any further help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi there, Thanks for your responses. I haven't used/heard of dput() before. I'm looking it up understanding how it works. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591003.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Assuming everything else is good, the all or all.x or all.y arguments to merge() should do what I think you're asking for. You did read the help page for merge, right? -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/26/12 12:33 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi there, I wish to merge a common variable between a list and a data.frame return rows via the data.frame where there is NO match. Here are some details: The list, where the variable/col.name = CLAIM_NO CLAIM_NO 20 83 1440 4439 7002 ... dim(hrc78_clm_no) [1] 66781 The data.frame, where there exists a variable with the same name, CLAIM_NO. dim(bestPartAreadmin) [1] 1306893 I wish to merge the two together only return a data.frame where there is NO match in the CLAIM_NO between both files. I've read tried code via the merge function. If merge can do this, I'm missing something with the available options. I'm figuring something like: clm_no_nomatch - merge(hrc78_clm_no, bestPartAreadmin, by = CLAIM_NO, .. .. ..) Your help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p 4590755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
# dput() example # lets say you have data called y, like this: y sp1 sp2 sp3 sp4 d 0 0 0 0 e 0 0 0 0 f 0 0 0 0 # ok, so do this: dput(y) structure(list(sp1 = c(0, 0, 0), sp2 = c(0, 0, 0), sp3 = c(0, 0, 0), sp4 = c(0, 0, 0)), .Names = c(sp1, sp2, sp3, sp4 ), row.names = c(d, e, f), class = data.frame) # now copy and paste that into your R terminal to see why it is so nice. RHelpPlease wrote Hi there, Thanks for your responses. I haven't used/heard of dput() before. I'm looking it up understanding how it works. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591189.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
On Jul 1, 2011, at 06:48 , Jeff Newmiller wrote: You haven't provided a reproducible example. I do notice you are using T and F which are variables that can be redefined (which is why TRUE and FALSE are preferred. Also, if x and y really are vectors (I bet they're not, though), you'll get the cartesian product whatever all.x and all.y are, unless you specify by.x=x and by.y=y. I.e., merge(1:3,2:4,all.y=F,all.x=T) x y 1 1 2 2 2 2 3 3 2 4 1 3 5 2 3 6 3 3 7 1 4 8 2 4 9 3 4 merge(1:3,2:4,by.x=x,by.y=y) x 1 2 2 3 merge(1:3,2:4,by.x=x,by.y=y, all.x=T) x 1 1 2 2 3 3 All just to point out the importance of actual examples. Mind reading is sort of fun and some correspondents on mailing lists get rather good at it, but it is more expedient to have a well-defined problem from the outset. -pd Downey, Patrick pdow...@urban.org wrote: Hello, I'm clearly confused about the merge function. In the following r - merge(x,y,all.x=T,all.y=F) my y vector has only unique values (no duplicates). So I don't understand how this can ever generate an r which is of greater length than x. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
I was mistaken. There were duplicates in my y vector. Please ignore my previous message. Sorry. -Original Message- From: Downey, Patrick Sent: Thu 6/30/2011 11:08 PM To: r-help@r-project.org Subject: merge function Hello, I'm clearly confused about the merge function. In the following r - merge(x,y,all.x=T,all.y=F) my y vector has only unique values (no duplicates). So I don't understand how this can ever generate an r which is of greater length than x. I thought the default behavior was only matching rows are included, but that using all.x=T included rows with unmatched x's as well. If all the y's are unique, though, I don't understand how length(r) length(x) is possible. Some clarification would be great. Thanks, Mitch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function
You haven't provided a reproducible example. I do notice you are using T and F which are variables that can be redefined (which is why TRUE and FALSE are preferred. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Downey, Patrick pdow...@urban.org wrote: Hello, I'm clearly confused about the merge function. In the following r - merge(x,y,all.x=T,all.y=F) my y vector has only unique values (no duplicates). So I don't understand how this can ever generate an r which is of greater length than x. I thought the default behavior was only matching rows are included, but that using all.x=T included rows with unmatched x's as well. If all the y's are unique, though, I don't understand how length(r) length(x) is possible. Some clarification would be great. Thanks, Mitch [[alternative HTML version deleted]] _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
Thanks Chuck, I was trying to implement something more complicated than what I had to and after finding the reduce() function in bioconductor, everything went smoothly. Thanks again -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2327133.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
I think it would be helpful if you could clarify youre question - do you want distinct sets - maybe use unique() but why (5,20) when its (5,10) in the row in youre example? What criteria do you want the function to select the sets by and what kind of output do you need? Maybe it's just me who dosn't get the question..sr -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
I too think I worded it incorrectly... so the second two columns of the matrix are the start and end of an interval however, because some of the intervals overlap, I want to limit the number of intervals I have to deal with. So therefore, (5 10)should merge with(7 18) making(5 18) and then (518) should merge with (1620) giving (520) whereas (1 4) has no overlap with any other interval and is therefore left on its own Ideal output would just be a collapsing of the matrix sample start end # 5 20 # 14 I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives me a c(1:4,5:20) However, I have to do this on a very large dataset and the numbers are more like c(100542:100782,598322:598821,...) any help would be appreciated thanks -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
Neither you nor your responder have continued the eamil chain very well so let me put things back together: on Aug 13, 2010; 03:54pm fishkbob wrote subj = merge function in R? So I have a bunch of c(start,end) points and want to consolidate them into as few c(start,end) as possible. For example: sample startend A 5 10 B 7 18 C 14 D 16 20 I'd want the function to return the two distinct sets (1,4) and (5,20) Is there an R function that already does this? or should I write my own? (how would I go about that?) In an effort to be be helpful but not copying the prior message on Aug 13, 2010; 06:46pm JesperHybel wrote: I think it would be helpful if you could clarify youre question - do you want distinct sets - maybe use unique() but why (5,20) when its (5,10) in the row in youre example? What criteria do you want the function to select the sets by and what kind of output do you need? Maybe it's just me who dosn't get the question..sr On Aug 13, 2010, at 7:01 PM, fishkbob wrote: I too think I worded it incorrectly... so the second two columns of the matrix are the start and end of an interval however, because some of the intervals overlap, I want to limit the number of intervals I have to deal with. So therefore, (5 10)should merge with(7 18) making(5 18) and then (518) should merge with (1620) giving (520) whereas (1 4) has no overlap with any other interval and is therefore left on its own Ideal output would just be a collapsing of the matrix sample start end # 5 20 # 14 I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives me a c(1:4,5:20) However, I have to do this on a very large dataset and the numbers are more like c(100542:100782,598322:598821,...) any help would be appreciated thanks -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html Sent from the R help mailing list archive at Nabble.com. Nabble is where I saw all of this, but Nabble is not r-help: I suggest you sort your rows by the start variable and then examine where the breaks would remain by looking at the prior values of end: dd - rd.txt(sample startend + A 5 10 + B 7 18 + C 14 + D 16 20) dd[order(dd$start), ] sample start end 3 C 1 4 1 A 5 10 2 B 7 18 4 D16 20 ndd - dd[order(dd$start), ] ndd$inprior - c(NA, ndd[1:nrow(ndd)-1,3] = ndd[2:nrow(ndd),2] ) ndd sample start end inprior 3 C 1 4 NA 1 A 5 10 FALSE 2 B 7 18TRUE 4 D16 20TRUE -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
On Fri, 13 Aug 2010, fishkbob wrote: So I have a bunch of c(start,end) points and want to consolidate them into as few c(start,end) as possible. For example: sample startend A 5 10 B 7 18 C 14 D 16 20 I'd want the function to return the two distinct sets (1,4) and (5,20) Is there an R function that already does this? Yes. See the reduce() function in the IRanges package on BioConductor See pages 11-12 of http://www.bioconductor.org/packages/2.6/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf HTH, Chuck or should I write my own? (how would I go about that?) -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324684.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.