Re: [R] Merge function - Return NON matches
Hi there, I've tried the noted solutions: If you do `no - unlist(hrc_78_clm_no`, do you get a character vector of claim numbers you want to exclude? If so, then `subset(whatever, !CLAIM_NO %in% no)` should work. I converted the CLAIM_NO list to a character, with hrc78_clmno_char - format(as.character(hrc78_clm_no)) is.character(hrc78_clmno_char) [1] TRUE Then I applied your code (above), which didn't work. Thanks though! Thanks for the dput() help. Here is truncated output of the list (its class is data.frame, I call it a list for communication sake) data.frame. Again, your help is most appreciated! Goal: merge the list data.frame together. Output the data.frame, but with rows where the CLAIM_NO variable between the list data.frame *do not match*. *The List* truncated_list - hrc78_clm_no[1:100,] #So you can see consistency in previously-mentioned variables truncated_list - structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 7002L, 9562L, 10463L, 12503L, 16195L, 22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 200473L, 200927L, 202407L), .Names = c(CLAIM_NO), class = data.frame)) *The (multi-column) data.frame, but greatly truncated* truncated_dataframe - bestPartAreadmin[1:25, 1:4] truncated_dataframe - structure(list(DESY_SORT_KEY = c(10193L, 10193L, 10193L, 10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM = structure(c(1368L, 1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 166L, 196L, 196L, 311L, 1363L), .Label = c(010001, 010006, 010015, 010016, 010029, 010033, 010034, 010035, 010039, 010040, 010046, 010049, 010083, 010092, 010108, 010131, 010149, 01S001, 01S033, 01S046, 01S145, 020001, 020006, 020012, 020017, 021306, 021311, 030002, 030006, 030007, 030010, 030011, 030012, 030013, 030014, 030016, 030023, 030024, 030030, 030033, 030036, 030037, 030038, 030043, 030055, 030061, 030062, 030064, 030065, 030067, 030069, 030078, 030083, 030085, 030087, 030088, 030089, 030092, 030093, 030100, 030101, 030102, 030103, 030105, 030108, 030110, 030111, 030114, 030115, 030117, 030118, 030119, 030120, 030121, 030122, 030123, 030126, 030128, 031300, 031305, 031311, 032000, 032001, 032002, 032006, 033025, 033028, 033029, 033032, 033034, 033036, 034004, 034013, 034020, 034024, 03S002, 03S006, 03S007, 03S016, 03S022, 03S023, 03S089, 03T002, 03T055, 03T061, 03T069, 03T093, 03T103, 03T114, 03T117, 03T126, 040004, 040007, 040010, 040011, 040016, 040022, 040026, 040027, 040029, 040036, 040041, 040047, 040055, 040062, 040072, 040080, 040084, 040088, 040091, 040114, 040118, 040119, 043028, 044005, 04S027, 04S084, 04T041, 04T062, 04T119, 050002, 050006, 050007, 050008, 050009, 050013, 050014, 050016, 050017, 050018, 050022, 050024, 050025, 050026, 050030, 050036, 050038, 050039, 050040, 050042, 050043, 050045, 050046, 050047, 050055, 050056, 050057, 050058, 050060, 050063, 050069, 050070, 050071, 050073, 050075, 050076, 050077, 050078, 050079, 050082, 050084, 050089, 050090, 050091, 050093, 050099, 050100, 050101, 050102, 050103, 050104, 050107, 050108, 050110, 050111, 050112, 050113, 050115, 050116, 050118, 050121, 050122, 050124, 050125, 050126, 050128, 050129, 050131, 050132, 050133, 050135, 050136, 050137, 050138, 050139, 050140, 050145, 050146, 050149, 050150, 050152, 050153, 050158, 050159, 050168, 050169, 050174, 050179, 050180, 050188, 050191, 050193, 050195, 050196, 050197, 050204, 050211, 050219, 050222, 050224, 050225, 050226, 050228, 050230, 050231, 050232, 050234, 050235, 050236, 050238, 050239, 050242, 050243, 050245, 050248, 050254, 050257, 050261, 050262, 050264, 050272, 050276, 050277, 050278, 050279, 050280, 050283, 050289, 050290, 050291, 050292, 050295, 050296, 050298, 050300, 050301, 050305, 050308, 050309, 050313, 050315, 050320, 050324, 050327, 050329, 050334, 050335, 050336, 050342, 050348, 050351, 050352, 050353, 050359, 050360, 050366, 050367, 050373, 050376, 050378, 050380, 050382, 050385, 050390, 050393,
Re: [R] Merge function - Return NON matches
Hi If you used shorter names for your objects you will get probably more readable advice Is this what you wanted? truncated_dataframe[truncated_dataframe$CLAIM_NO %in% setdiff(truncated_dataframe$CLAIM_NO, truncated_list$CLAIM_NO),] Regards Petr Hi there, I've tried the noted solutions: If you do `no - unlist(hrc_78_clm_no`, do you get a character vector of claim numbers you want to exclude? If so, then `subset(whatever, !CLAIM_NO %in% no)` should work. I converted the CLAIM_NO list to a character, with hrc78_clmno_char - format(as.character(hrc78_clm_no)) is.character(hrc78_clmno_char) [1] TRUE Then I applied your code (above), which didn't work. Thanks though! Thanks for the dput() help. Here is truncated output of the list (its class is data.frame, I call it a list for communication sake) data.frame. Again, your help is most appreciated! Goal: merge the list data.frame together. Output the data.frame, but with rows where the CLAIM_NO variable between the list data.frame *do not match*. *The List* truncated_list - hrc78_clm_no[1:100,] #So you can see consistency in previously-mentioned variables truncated_list - structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 7002L, 9562L, 10463L, 12503L, 16195L, 22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 200473L, 200927L, 202407L), .Names = c(CLAIM_NO), class = data.frame)) *The (multi-column) data.frame, but greatly truncated* truncated_dataframe - bestPartAreadmin[1:25, 1:4] truncated_dataframe - structure(list(DESY_SORT_KEY = c(10193L, 10193L, 10193L, 10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM = structure(c(1368L, 1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 166L, 196L, 196L, 311L, 1363L), .Label = c(010001, 010006, 010015, 010016, 010029, 010033, 010034, 010035, 010039, 010040, 010046, 010049, 010083, 010092, 010108, 010131, 010149, 01S001, 01S033, 01S046, 01S145, 020001, 020006, 020012, 020017, 021306, 021311, 030002, 030006, 030007, 030010, 030011, 030012, 030013, 030014, 030016, 030023, 030024, 030030, 030033, 030036, 030037, 030038, 030043, 030055, 030061, 030062, 030064, 030065, 030067, 030069, 030078, 030083, 030085, 030087, 030088, 030089, 030092, 030093, 030100, 030101, 030102, 030103, 030105, 030108, 030110, 030111, 030114, 030115, 030117, 030118, 030119, 030120, 030121, 030122, 030123, 030126, 030128, 031300, 031305, 031311, 032000, 032001, 032002, 032006, 033025, 033028, 033029, 033032, 033034, 033036, 034004, 034013, 034020, 034024, 03S002, 03S006, 03S007, 03S016, 03S022, 03S023, 03S089, 03T002, 03T055, 03T061, 03T069, 03T093, 03T103, 03T114, 03T117, 03T126, 040004, 040007, 040010, 040011, 040016, 040022, 040026, 040027, 040029, 040036, 040041, 040047, 040055, 040062, 040072, 040080, 040084, 040088, 040091, 040114, 040118, 040119, 043028, 044005, 04S027, 04S084, 04T041, 04T062, 04T119, 050002, 050006, 050007, 050008, 050009, 050013, 050014, 050016, 050017, 050018, 050022, 050024, 050025, 050026, 050030, 050036, 050038, 050039, 050040, 050042, 050043, 050045, 050046, 050047, 050055, 050056, 050057, 050058, 050060, 050063, 050069, 050070, 050071, 050073, 050075, 050076, 050077, 050078, 050079, 050082, 050084, 050089, 050090, 050091, 050093, 050099, 050100, 050101, 050102, 050103, 050104, 050107, 050108, 050110, 050111, 050112, 050113, 050115, 050116, 050118, 050121, 050122, 050124, 050125, 050126, 050128, 050129, 050131, 050132, 050133, 050135, 050136, 050137, 050138, 050139, 050140, 050145, 050146, 050149, 050150, 050152, 050153, 050158, 050159, 050168, 050169, 050174, 050179, 050180, 050188, 050191, 050193, 050195, 050196, 050197, 050204, 050211, 050219, 050222, 050224, 050225, 050226, 050228, 050230, 050231, 050232, 050234, 050235, 050236, 050238, 050239, 050242, 050243, 050245, 050248, 050254, 050257, 050261, 050262, 050264, 050272, 050276,
Re: [R] Merge function - Return NON matches
Hi again, Petr, your solution worked! Thanks everyone for your input. I'll look more into setdiff. Cheers! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4593101.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge function - Return NON matches
Hi there, I wish to merge a common variable between a list and a data.frame return rows via the data.frame where there is NO match. Here are some details: The list, where the variable/col.name = CLAIM_NO CLAIM_NO 20 83 1440 4439 7002 ... dim(hrc78_clm_no) [1] 66781 The data.frame, where there exists a variable with the same name, CLAIM_NO. dim(bestPartAreadmin) [1] 1306893 I wish to merge the two together only return a data.frame where there is NO match in the CLAIM_NO between both files. I've read tried code via the merge function. If merge can do this, I'm missing something with the available options. I'm figuring something like: clm_no_nomatch - merge(hrc78_clm_no, bestPartAreadmin, by = CLAIM_NO, .. .. ..) Your help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi, To increase the chances of you getting help on this one, please give example data (a small data.frame, a small list) that you are trying to do this on, and also show the desired output. Whip these variables up in your R workspace and paste the output of `dput` for each into your follow up email. It's hard (for me, anyways) to get what you're after ... I'm guessing something that ends up looking like this will end up being one solution: subset(your.df, !CLAIM_NO %in% `something`) but it's hard for me to tell from where I'm setting. -steve On Thu, Apr 26, 2012 at 3:33 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi there, I wish to merge a common variable between a list and a data.frame return rows via the data.frame where there is NO match. Here are some details: The list, where the variable/col.name = CLAIM_NO CLAIM_NO 20 83 1440 4439 7002 ... dim(hrc78_clm_no) [1] 6678 1 The data.frame, where there exists a variable with the same name, CLAIM_NO. dim(bestPartAreadmin) [1] 13068 93 I wish to merge the two together only return a data.frame where there is NO match in the CLAIM_NO between both files. I've read tried code via the merge function. If merge can do this, I'm missing something with the available options. I'm figuring something like: clm_no_nomatch - merge(hrc78_clm_no, bestPartAreadmin, by = CLAIM_NO, .. .. ..) Your help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi Steve, Thanks for replying. Here's a small piece of the data.frame: bestPartAreadmin[1:5,1:6] DESY_SORT_KEY PRVDR_NUM CLM_THRU_DT CLAIM_NO NCH_NEAR_LINE_REC_IDEN_CD NCH_CLM_TYPE_CD 1 10193 290003 20090323 20 V60 2 10193 290045 20091124 21 V60 3 10193 29T003 20090401 22 V60 4 10574 050017 20090527 83 V60 5 10574 050017 20090921 84 V60 There's 93 columns total in the data.frame, so these are the first six, where you can see CLAIM_NO. I wish for the resultant data.frame to look just like the data.frame above, but values for CLAIM_NO (above) are those that differ/don't match the corresponding CLAIM_NO values in the list (hrc78_clm_no). Does this help? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590810.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi again, I tried the sample code like this: merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) dim(merged_clmno) [1] 1306893 Note that: dim(bestPartAreadmin) [1] 1306893 So, no change between the original data.frame (bestPartAreadmin) the (should be) less-rows merged_clmno data.frame. Any further help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
You'd get better help if you actually did as Steve requested and provided sample data (a reproducible example!) using dput(). But since you didn't: fakedata - data.frame(a = 1:5, b=11:15, c=c(1,1,1,2,2)) fakedata a b c 1 1 11 1 2 2 12 1 3 3 13 1 4 4 14 2 5 5 15 2 notb - c(12, 14, 15) subset(fakedata, !b %in% notb) a b c 1 1 11 1 3 3 13 1 Since you say that doesn't work for you, you absolutely have to provide us with a reproducible example for anyone to be able to diagnose your problem. Sarah On Thu, Apr 26, 2012 at 4:12 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi again, I tried the sample code like this: merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) dim(merged_clmno) [1] 13068 93 Note that: dim(bestPartAreadmin) [1] 13068 93 So, no change between the original data.frame (bestPartAreadmin) the (should be) less-rows merged_clmno data.frame. Any further help is most appreciated! -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi, As Sarah reiterated -- it'd *really* be helpful if you give us data we can actually work with. That having been said: On Thu, Apr 26, 2012 at 4:12 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi again, I tried the sample code like this: merged_clmno - subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) dim(merged_clmno) [1] 13068 93 Note that: dim(bestPartAreadmin) [1] 13068 93 So, no change between the original data.frame (bestPartAreadmin) the (should be) less-rows merged_clmno data.frame. You're original email said you had a list that contains CLAIM_NO's you want to exclude. Is `hrc78_clm_no` this list -- does it only have claim_no's? passing a list into the subset call after `%in%` won't work. If you do `no - unlist(hrc_78_clm_no`, do you get a character vector of claim numbers you want to exclude? If so, then `subset(whatever, !CLAIM_NO %in% no)` should work. HTH, -steve Any further help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi there, Thanks for your responses. I haven't used/heard of dput() before. I'm looking it up understanding how it works. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591003.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Assuming everything else is good, the all or all.x or all.y arguments to merge() should do what I think you're asking for. You did read the help page for merge, right? -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/26/12 12:33 PM, RHelpPlease rrum...@trghcsolutions.com wrote: Hi there, I wish to merge a common variable between a list and a data.frame return rows via the data.frame where there is NO match. Here are some details: The list, where the variable/col.name = CLAIM_NO CLAIM_NO 20 83 1440 4439 7002 ... dim(hrc78_clm_no) [1] 66781 The data.frame, where there exists a variable with the same name, CLAIM_NO. dim(bestPartAreadmin) [1] 1306893 I wish to merge the two together only return a data.frame where there is NO match in the CLAIM_NO between both files. I've read tried code via the merge function. If merge can do this, I'm missing something with the available options. I'm figuring something like: clm_no_nomatch - merge(hrc78_clm_no, bestPartAreadmin, by = CLAIM_NO, .. .. ..) Your help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p 4590755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
# dput() example # lets say you have data called y, like this: y sp1 sp2 sp3 sp4 d 0 0 0 0 e 0 0 0 0 f 0 0 0 0 # ok, so do this: dput(y) structure(list(sp1 = c(0, 0, 0), sp2 = c(0, 0, 0), sp3 = c(0, 0, 0), sp4 = c(0, 0, 0)), .Names = c(sp1, sp2, sp3, sp4 ), row.names = c(d, e, f), class = data.frame) # now copy and paste that into your R terminal to see why it is so nice. RHelpPlease wrote Hi there, Thanks for your responses. I haven't used/heard of dput() before. I'm looking it up understanding how it works. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591189.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.