Re: [R] counting duplicate items that occur in multiple groups

Avi Gross via R-help Tue, 17 Nov 2020 16:00:22 -0800

Many problems can often be solved with some thought by using the right tools, 
such as the ones from the tidyverse.

Without giving a specific answer, you might want to think about using the 
group_by() functionality in a pipeline that would lump together all rows 
matching say having the same value in several columns. Then in something like a 
mutate() or summarize() you can use special functions like n() that return how 
many rows exist within each grouping. There are many more such verbs and 
features that let you build up something, often by removing the grouping along 
the way and perhaps adding some other form of grouping including the new 
rowwise() that then lets you do things across columns on a row at a time and so 
on.

I think the point is to think of steps that lead to a result that can be used 
in the next step and so on. 

And, for some problems, you can  think outside the pipelines and create 
multiple intermediate data.frames with parts of what you will need and then 
combine them with joins or whatever it takes to efficiently get a result, or by 
brute force. Sometimes (as when making graphs) you might want to convert data 
between forms that are often called long versus wide. 

Yes, plenty can be done in base R or using other packages. But a good set of 
tools might be part of what you need to investigate.

Of course, others can chime in suggesting that there are negatives to dplyr and 
other aspects of the tidyverse and they would be right too. 

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Tom Woolman
Sent: Tuesday, November 17, 2020 6:30 PM
To: Bill Dunlap <williamwdun...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] counting duplicate items that occur in multiple groups

Hi Bill. Sorry to be so obtuse with the example data, I was trying (too hard) 
not to share any actual values so I just created randomized values for my 
example; of course I should have specified that the random values would not 
provide the expected problem pattern. I should have just used simple dummy 
codes as Bill Dunlap did.

So per Bill's example data for Data1, the expected (hoped for) output should be:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1     V1      A1      0
2     V2      A2      3
3     V3      A2      3
4     V4      A2      3

Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.  
The value is 3 for V2, V3 and V4 because they each share bank account A2.

Likewise, in the Data2 frame, the same logic applies:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1     V1      A1     0
2     V2      A2     3
3     V3      A2     3
4     V1      A2     3
5     V4      A3     0
6     V2      A4     0

Thanks!

Quoting Bill Dunlap <williamwdun...@gmail.com>:

> What should the result be for
>   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> Account=c("A1","A2","A2","A2"))
> ?
>
> Must each vendor have only one account?  If not, what should the 
> result be for
>    Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> Account=c("A1","A2","A2","A2","A3","A4"))
> ?
>
> -Bill
>
> On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twool...@ontargettek.com>
> wrote:
>
>> Hi everyone.  I have a dataframe that is a collection of Vendor IDs 
>> plus a bank account number for each vendor. I'm trying to find a way 
>> to count the number of duplicate bank accounts that occur in more 
>> than one unique Vendor_ID, and then assign the count value for each 
>> row in the dataframe in a new variable.
>>
>> I can do a count of bank accounts that occur within the same vendor 
>> using dplyr and group_by and count, but I can't figure out a way to 
>> count duplicates among multiple Vendor_IDs.
>>
>>
>> Dataframe example code:
>>
>>
>> #Create a sample data frame:
>>
>> set.seed(1)
>>
>> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =
>> sample(1:10000))
>>
>>
>>
>>
>> Thanks in advance for any help.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

Reply via email to