Quoting Alex Byrley <anbyr...@buffalo.edu>:
See, I have built my own genetic algorithm already and tested it on this
problem. I have a solution, but due to the heuristic nature of GA, I cannot
guarantee that it is the optimal subset.
If I was simply doing this for a company project, you are spot on with the
type of algorithm I would use, but I am doing this for a scientific paper.
I need to be able to find the optimal subset over my dataset, and I know
branch and bound will find it without resorting to exhaustive search. If I
can't claim that my subset is optimal, it is going to open my paper up to
serious enough criticism that it will get rejected, regardless of whether
my new method outperforms the state-of-the-art or not. (And regardless if
my dataset is representative enough to make such performance claims)
Just a comment:
A heuristic such as a Genetic Algorithm does indeed not guarantee the
optimum [*], but will only provide a stochastic approximation. This
approximation will get better if the algorithm gets more computing time;
but still, the solution will remain random.
However, people who read scientific papers can deal with such randomness,
as long as it is properly explained and reported. (Otherwise, no results
that relied on stochastic methods, such as MCMC, would ever have
been published.)
Some years ago, we wrote a short article on this topic (though on a
financial application):
https://link.springer.com/article/10.1007/s10732-010-9138-y
Kind regards (and good luck)
Enrico Schumann
[*] Which does not mean it has not found it.
I will take a look at that page, thanks! Hopefully there is an R
implementation of generic B&B as I described out there somewhere...
Alex Byrley
Graduate Student
Department of Electrical Engineering
235 Davis Hall
(716) 341-1802
2017-07-01 3:53 GMT-04:00 Enrico Schumann <e...@enricoschumann.net>:
On Thu, 29 Jun 2017, Alex Byrley writes:
> I am looking for packages that can run a branch-and-bound algorithm to
> maximize a distance measure (such as Bhattacharyya or Mahalanobis) on a
set
> of features.
>
> I would like this to be learning algorithm independent, so that the
method
> just looks at the features, and selects the subset of a user-defined size
> that maximizes a distance criteria such as those stated above.
>
> Can anyone give some suggestions?
>
> Alex Byrley
> Graduate Student
> Department of Electrical Engineering
> 235 Davis Hall
> (716) 341-1802
>
It seems you are looking for a generic optimisation
algorithm; so perhaps start at the task view:
https://cran.r-project.org/web/views/Optimization.html
What you describe is a combinatorial problem: select k
from N features, with k (much) smaller than N. So I'd
suggest to also look at heuristic algorithms that can
deal with such problems (e.g. genetic algorithms).
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.