You could try this:

dat2<- read.table(text='
 case pin some_data
 "A"  "1" "data"  
"A"  "2" "data"  
"A"  "1" "data"  
"A"  "2" "data"  
"B"  "1" "data"  
"B"  "2" "data"
#  case pin some_data
#1    A   1      data
#2    A   2      data
#5    B   1      data
#6    B   2      data

 dat2[row.names(unique(dat2[,1:2])),] ##assuming that the third column is 
different for the duplicated `case` and `pin`
 # case pin some_data
#1    A   1      data
#2    A   2      data
#5    B   1      data
#6    B   2      data

#If `some_data` is same for duplicated rows:
#  case pin some_data
#1    A   1      data
#2    A   2      data
#5    B   1      data
#6    B   2      data



First off, I'm sure that this is posted somewhere but I've not 
been able to find what I'm looking for. Please forgive the duplication 
and thank you for your help!!!! 

I have a crime dataset of over 500k observations in one file. To
 simplify my problem, I have a dataframe that has a "case" ID in one 
column, a personal ID number (pin) in another, and associated "data" in 
subsequent columns. 

     case pin some_data 
[1,] "A"  "1" "data"   
[2,] "A"  "2" "data"   
[3,] "A"  "1" "data"   
[4,] "A"  "2" "data"   
[5,] "B"  "1" "data"   
[6,] "B"  "2" "data"   

I would like to subset the data so that only unique PINs and CASES are left 
with the subsequent data 

     case pin some_data 
[1,] "A"  "1" "data"   
[2,] "A"  "2" "data"   
[5,] "B"  "1" "data"   
[6,] "B"  "2" "data"   

I'm teaching my self how to program in R and I'm thinking that I want a loop to 
say something like: 
- find and keep first row of unique PIN & CASE 
- if PIN is duplicate but CASE is different, keep first row of dupe PIN & new 

Longer Explanation: 
The PIN identifies an arrested offender. I want to check and see if 
there was recidivism, repeat offenses and arrests, for each 
offender/PIN. The way I can do that is by checking whether a PIN has 
multiple CASE numbers. I also want to keep the single arrests in the 
dataset too. I have over 6 million cases for several years. 

I hope this makes sense, I've been banging my head for a while on this one and 
really would appreciate the help!!

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to