Hello, I have a problem in feature selection I would be thankful if you can help me. I have a dataset with limited samples (for example 100) and a lot of features (for example 3000) and i have to do feature selection. if i use cross validation (for example *10 fold*) i rank the features based on 90 samples (using svmrfe method) i achieve ranked feature for example {f2,f4,f1,f3,...} (it means f2 is ranked first with svmrfe) now, I want to know how many features i should use? so i should compute the performance for n feature selected from first of the ranked list and compute the performance of it. for example train learner with f2, another time with f2,f4, another time with f2,f4,f1 ... and see which is better , but my problem is: 1) first of all,for comparison should i use the performance of 9 fold that has been used for ranking or the performance of learner on the fold which has been left out?I mean in the *feature selection* *step (not in the final evaluation)*,for example to see I should select only f2 or select {f2 , f4} how should I compare? 2) in each stage of cross validation different feature subset will be created . i can compute for each feature the number of times it has repeated in each folding result, but after that how can i conclude the final feature set? can you please help me? I need your urgent help. thanks in advance Azadeh
[[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.