It is very rare for variable selection to require much more than forward or reverse step-wise refinement. In practice, main effects and low order interactions are prominent enough that you can see them one at a time. It is very unusual for interactions to be so balanced that they entirely mask away the main effects.
Also, if you are using something like regression (which is what many methods are at their heart), it is instructive to build a toy problem with a few main effects and many noise variables and then look at the posterior distribution of the coefficients. For non-interactive models, what you see is that the posterior distribution of the main effects does not vary with the addition of noise variables. What this means is that techniques like Bonferroni correction really lead you astray. What is happening with variable selection is that you gather enough data in order to be sure that you will find the main effects that you want. Whatever cutoff you choose will have some probability of selecting noise variables as well, but if you select only a few of them, then you are going to be pretty well off. The complexity of this sort of procedure is not very high at all. With interactions, the situation is a bit more complex because the design matrix isn't so nicely orthogonal, but the same rough principles apply. The crux of all of this, then, is to have as good a selection test as possible that increases the probability of selecting terms with (truly) non-zero coefficients and rejecting terms with (truly) zero coefficients. For very large numbers of variables that clearly requires substantial regularization. SVM fans will introduce large margin techniques for this and Bayesians will introduce priors, but the effect is essentially the same. The Breiman paper on random forests also has some lovely techniques for selecting variables that re-run the training after randomizing the variable under consideration or simply by re-running the model evaluation. If the randomization has little effect, then you can presume that the variable is of little importance. Random forests and all of the related techniques that Breiman suggested have the lovely property (for us) of being very compute intensive. On Sat, May 31, 2008 at 7:17 AM, Karl Wettin <[EMAIL PROTECTED]> wrote: > I've been thinking a bit about feature selection and have problems making > up my mind when it comes to what sort of solutions makes most sense to > initally focus on. If people have an unlimited number of nodes then > complexity is no longer a big problem. > > My guess is that people historically use nested subset ranking as the > complexity is log N rather than N of exhaustive search wrappers. My guts > tells me that the latter will produce a better result (given you also do > Bonferroni correction or so) but who has an unlimited number of nodes? > > The real question is hidden somewhere in between the lines and it doesn't > only apply to feature selection. > > > > karl > -- ted
