Greetings. A subset of a problem that the group I work with turns out to be an optimization problem, in the sense of linear programming.
We've looked at several approaches to this but haven't found one that seems to be the right fit. I'm looking for some guidance in finding an R package that does what we want in an acceptable time. Here's a toy example. We start with a matrix, called "gMat1" (historical reasons): > gMat1 <- matrix(c(3, 6, 9, 5, 9, 5), nrow=3) > print(gMat1) [,1] [,2] [1,] 3 5 [2,] 6 9 [3,] 9 5 The goal is to add the contents of one column of each row to one of two bins (in general the number of bins equals the number of columns) such that the minimum number in each bin is maximized. An example follows. In the toy example, the possibilities can be enumerated simply: > allChoices <- expand.grid(rep(list(1:ncol(gMat1)), nrow(gMat1))) > print(allChoices) Var1 Var2 Var3 1 1 1 1 2 2 1 1 3 1 2 1 4 2 2 1 5 1 1 2 6 2 1 2 7 1 2 2 8 2 2 2 For example, with the first choice, (1, 1, 1), column 1, hence, bin 1, is selected for all three rows, giving a result for the two bins of: 18 0 For the second choice, (2, 1, 1), the '2' in the first position selects the contents of column 2 for bin 2. The remaining (1, 1) select the (6, 9) in the first column and assign those to bin 1: 15 5 The result is a set of "binSums" corresponding to the each of the set of possible choices: > print(binSums) [,1] [,2] [1,] 18 0 [2,] 15 5 [3,] 12 9 [4,] 9 14 [5,] 9 5 [6,] 6 10 [7,] 3 14 [8,] 0 19 Having generated the sums, the goal is to pick the row that has the largest minimum and map that back to the original choice. In the toy example, both rows 3 and 4 satisfy that criterion ('9' is the minimum in each, and '9' is bigger than the minima in the other 6 rows -- 0, 3, 5, 6). In the real case there are potentially thousands of rows and columns, so eye-balling is not an option. And, in fact, using "expand.grid" isn't even an option to generate the original choices. We've tried some ad hoc approaches that seem to work tolerably well. Here's one that we *might* have considered, if we *could* have generated the "allChoices/binSums": > bsVar <- apply(binSums, 1, var) > locBestVar <- which(bsVar == min(bsVar)) > allChoices[locBestVar, ] Var1 Var2 Var3 3 1 2 1 But there was a feeling within the group that we didn't have a solid-enough foundation using the ad hoc approaches. Hence, we asked for help from an expert in Operations Research. He was able to solve a problem of realistic size in more or less no time at all using the "GAMS" software: https://www.gams.com/ Unfortunately, GAMS is not free software, and we are hoping to produce an R package that is freely distributable. The next suggestion was to use Gurobi: http://www.gurobi.com/ which is evidently free for academic use but not otherwise. Better, but still not perfect. (And I couldn't use the free version of Gurobi while working from home, as it didn't consider my home network to be associated with an academic institution -- which of course it isn't). Finally, we tried: Rglpk_solve_LP from the R package "Rglpk": https://cran.r-project.org/web/packages/Rglpk/index.html This satisfied the licensing constraints, but we were unable to produce a result in a "finite" amount of time. By this I mean that we ran the Rglpk software on a problem with 200 rows and 20 columns on the latest Mac Pro with 64GB of memory, and it didn't finish overnight. A realistic problem would have at least 10000 rows and 50 columns. (This admittedly might simply have been a consequence of our unfamiliarity with the package. Suggestions welcome.) To be clear, this process is not something we'd be doing once in order to build the package. This is something an end user would have to do every time he/she ran our package. If you've managed to read this far and have any suggestions as to how we might proceed, please send them my way. Thanks. -- Mike ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.