I have data similar to this:

Location Surveyor Result
A        1         83
A        2         76
A        3         45
B        1         71
B        4         67
C        2         23
C        5         12
D        3         34
E        4         75
F        4         46
G        5         90
etc (5 million records in total)

I need to divide the data to many subsets then randomly select 5 different 
locations and 5 different surveyors (one at each of the 5 randomly selected 
locations) for each subset.

The function I have written basically picks five locations and then 1 surveyor 
in each location, checks that there are five different surveyors and if there 
isn't tries again.  The problem is that for some subsets this doesn't work.

Some subsets don't have enough locations/surveyors or both, but this can be 
checked for easily.  The problem subsets do have enoughs locations and 
surveyors but still cannot produce 5 locations each with a different surveyor.  
The matrix below demonstrates such a subset:
 
                  locations
                  A B C D E
                1 1 0 0 0 0
Surveyors       2 1 0 0 0 0
                3 1 0 0 0 0
                4 1 0 0 0 0
                5 1 1 1 1 1

I cannot think of a way to check for such a situation and therefore I have 
simply programmed the function to give up after 100 attempts if it can't find a 
solution.  This is not very satisfactory however as the analysis takes a very 
long time to run and it would also be very useful useful for me to know how 
many suitable solution there are.

I reckon some of you clever folk out there must be able to think of a better 
solution.

Any advice appreciated,

Ben

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to