Re: [R] Constrained vector permutation
Chuck, Thanks for the reference. MCMC with swapping was my original goal and I think I'll go with that in the long run--although using sample() has worked out for now. I was initially concerned that checking all choose(n,2) possible swaps would slow down the process, but in my case for choose(30,2) = 435 this seems not unreasonable. I wonder if for larger vectors an alternative would be desired. Thanks to everyone for your help-- Andy On Fri, Jan 29, 2010 at 8:23 PM, Charles C. Berry cbe...@tajo.ucsd.eduwrote: On Fri, 29 Jan 2010, Andrew Rominger wrote: Being reasonably sure that all valid permutations are equally probable is important to me. I've played around with search algorithms in permuting contingency tables and find that possible solutions decrease rapidly once one starts assigning values, particularly if small values are assigned first, so it would seem all solutions are not equally probable (not only that but one frequently encounters dead ends where there are values left to assign and no allowable place to put them). As such I think I'd opt to use sample()... several times if needed. To clarify, yes, I only need one valid permutation, the idea is I'll generate 1000s of ordered vectors, and then for each one generate one valid permutation. Thanks very much for the help and insights-- Andy Andy, If you have some sense of importance sampling and/or MCMC you might look at Zaman and Simberloff (2002, Environmental and Ecological Statistics 9, 405--421). which concerns sampling a binary matrix with fixed margins - not quite your problem, but akin to it in being a combinatorial nightmare without an obvious direct solution of workable size for real problems. They define a neighborhood for each allowable matrix s.t. swapping a pair of 1's at ij and kl with a pair of 0's at il and kj (which doesn't violate the margin constraints) leads to a member of the neighborhood. IIRC, the size of the neighborhood and the sizes of the neighborhoods of the members of its neighborhood determine the probabilities of staying put or moving to get the next element of the MCMC chain and which member of the neighborhood to choose. I suppose something like that (i.e. defining neighborhoods of allowable permutations, measuring their size, and using this to guide sampling or develop importance weights) might apply in your case. Maybe something like this: start with an ordering of your n-vector that conforms to the constraints, look at all the choose(n,2) pairs of elements and check which of them could be exchanged to yield another conforming ordering; the allowable swaps define the neighborhood, and their number is its size. So, the idea is to develop an MCMC sampler. Run it for each ordered vector to get past the burn in, then take one value from it. HTH, Chuck On Thu, Jan 28, 2010 at 3:04 PM, Thomas Lumley tlum...@u.washington.edu wrote: On Thu, 28 Jan 2010, Jason Smith wrote: It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy I think I am not understanding what your ultimate goal is so I'm not sure I can give you appropriate advice. Are you looking for a single valid permutation or all of them? Since that constraint sets a ceiling on each subsequent value, it seems like you could solve this problem more easily and quickly by using a search strategy instead of random sampling or generating all permutations then testing. The constraint will help prune the search space so you only generate valid permutations. Once you are examining a particular element you can determine which of the additional elements would be valid, so only consider those. It's easy to generate valid permutations this way. It does not appear straightforward to ensure that all valid permutations are sampled with equal probability, which I thought was part of the specification of the problem. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Re: [R] Constrained vector permutation
Being reasonably sure that all valid permutations are equally probable is important to me. I've played around with search algorithms in permuting contingency tables and find that possible solutions decrease rapidly once one starts assigning values, particularly if small values are assigned first, so it would seem all solutions are not equally probable (not only that but one frequently encounters dead ends where there are values left to assign and no allowable place to put them). As such I think I'd opt to use sample()... several times if needed. To clarify, yes, I only need one valid permutation, the idea is I'll generate 1000s of ordered vectors, and then for each one generate one valid permutation. Thanks very much for the help and insights-- Andy On Thu, Jan 28, 2010 at 3:04 PM, Thomas Lumley tlum...@u.washington.eduwrote: On Thu, 28 Jan 2010, Jason Smith wrote: It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy I think I am not understanding what your ultimate goal is so I'm not sure I can give you appropriate advice. Are you looking for a single valid permutation or all of them? Since that constraint sets a ceiling on each subsequent value, it seems like you could solve this problem more easily and quickly by using a search strategy instead of random sampling or generating all permutations then testing. The constraint will help prune the search space so you only generate valid permutations. Once you are examining a particular element you can determine which of the additional elements would be valid, so only consider those. It's easy to generate valid permutations this way. It does not appear straightforward to ensure that all valid permutations are sampled with equal probability, which I thought was part of the specification of the problem. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
On Fri, 29 Jan 2010, Andrew Rominger wrote: Being reasonably sure that all valid permutations are equally probable is important to me. I've played around with search algorithms in permuting contingency tables and find that possible solutions decrease rapidly once one starts assigning values, particularly if small values are assigned first, so it would seem all solutions are not equally probable (not only that but one frequently encounters dead ends where there are values left to assign and no allowable place to put them). As such I think I'd opt to use sample()... several times if needed. To clarify, yes, I only need one valid permutation, the idea is I'll generate 1000s of ordered vectors, and then for each one generate one valid permutation. Thanks very much for the help and insights-- Andy Andy, If you have some sense of importance sampling and/or MCMC you might look at Zaman and Simberloff (2002, Environmental and Ecological Statistics 9, 405--421). which concerns sampling a binary matrix with fixed margins - not quite your problem, but akin to it in being a combinatorial nightmare without an obvious direct solution of workable size for real problems. They define a neighborhood for each allowable matrix s.t. swapping a pair of 1's at ij and kl with a pair of 0's at il and kj (which doesn't violate the margin constraints) leads to a member of the neighborhood. IIRC, the size of the neighborhood and the sizes of the neighborhoods of the members of its neighborhood determine the probabilities of staying put or moving to get the next element of the MCMC chain and which member of the neighborhood to choose. I suppose something like that (i.e. defining neighborhoods of allowable permutations, measuring their size, and using this to guide sampling or develop importance weights) might apply in your case. Maybe something like this: start with an ordering of your n-vector that conforms to the constraints, look at all the choose(n,2) pairs of elements and check which of them could be exchanged to yield another conforming ordering; the allowable swaps define the neighborhood, and their number is its size. So, the idea is to develop an MCMC sampler. Run it for each ordered vector to get past the burn in, then take one value from it. HTH, Chuck On Thu, Jan 28, 2010 at 3:04 PM, Thomas Lumley tlum...@u.washington.eduwrote: On Thu, 28 Jan 2010, Jason Smith wrote: It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy I think I am not understanding what your ultimate goal is so I'm not sure I can give you appropriate advice. Are you looking for a single valid permutation or all of them? Since that constraint sets a ceiling on each subsequent value, it seems like you could solve this problem more easily and quickly by using a search strategy instead of random sampling or generating all permutations then testing. The constraint will help prune the search space so you only generate valid permutations. Once you are examining a particular element you can determine which of the additional elements would be valid, so only consider those. It's easy to generate valid permutations this way. It does not appear straightforward to ensure that all valid permutations are sampled with equal probability, which I thought was part of the specification of the problem. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
Andrew Rominger ajrominger at gmail.com writes: I'm trying to permute a vector of positive integers 0 with the constraint Hi Andy I'm not sure if you are explicitly wanting to use a sampling approach, but the gtools library has a permutations function (found by ??permutation then ? gtools::combinations). Hope this helps, Jason Smith Here is the script I used: # Constraint # f(n_i) = 2 * f(n_(i-1)) # # Given a start value and the number of elements # recursively generate a vector representing the # maximum values each index is allowed # f - function(value, num_elements) { #cat(paste(f(,value,,,num_elements,)\n)) if (num_elements 1) { value; } else { z - c(value,f(2*value, num_elements-1)) } } # Generate base vector v - 2:6 # Calculate constraint vector v.constraints - f(v[1],length(v)-1) # Generate permutations using gtools functions library(gtools) v.permutations - permutations(length(v), length(v), v) # Check each permutation results - apply(v.permutations,1, function(x) all(x = v.constraints)) # # Display Results # print(Original Vector) print(v) print(Constraint Vector) print(v.constraints) print(Does Vector meet Constraints) print(cbind(v.permutations,results)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
I just realized I read through your email too quickly and my script does not actually address the constraint on each permutation, sorry about that. You should be able to use the permutations function to generate the vector permutations however. Jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
Hi Jason, Thanks for you suggestions, I think that's pretty close to what I'd need. The only glitch is that I'd be working with a vector of ~30 elements, so permutations(...) would take quite a long time. I only need one permutation per vector (the whole routine will be within a loop that generates pseudo-random vectors that could potentially conform to the constraints). In light of that, do you think I'd be better off doing something like: v.permutations - replicate(1,sample(v,length(v),rep=FALSE)) # instead of permutations() results - apply(v.permutations,2,function(x){all(x = f(x[1],length(x)-1))}) # function f(...) would be like your f It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy On Thu, Jan 28, 2010 at 6:15 AM, Jason Smith devja...@gmail.com wrote: I just realized I read through your email too quickly and my script does not actually address the constraint on each permutation, sorry about that. You should be able to use the permutations function to generate the vector permutations however. Jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy I think I am not understanding what your ultimate goal is so I'm not sure I can give you appropriate advice. Are you looking for a single valid permutation or all of them? Since that constraint sets a ceiling on each subsequent value, it seems like you could solve this problem more easily and quickly by using a search strategy instead of random sampling or generating all permutations then testing. The constraint will help prune the search space so you only generate valid permutations. Once you are examining a particular element you can determine which of the additional elements would be valid, so only consider those. --jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
On Thu, 28 Jan 2010, Jason Smith wrote: It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy I think I am not understanding what your ultimate goal is so I'm not sure I can give you appropriate advice. Are you looking for a single valid permutation or all of them? Since that constraint sets a ceiling on each subsequent value, it seems like you could solve this problem more easily and quickly by using a search strategy instead of random sampling or generating all permutations then testing. The constraint will help prune the search space so you only generate valid permutations. Once you are examining a particular element you can determine which of the additional elements would be valid, so only consider those. It's easy to generate valid permutations this way. It does not appear straightforward to ensure that all valid permutations are sampled with equal probability, which I thought was part of the specification of the problem. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.