Hi, Apology for this question being off the topic (OT) of R, though I expect this list might be the best place on the net to ask this question.
In brief, the question is: what classification algorithm can one use if the features are histograms? I have a classification problem, and believe that histograms of the distribution of some values may be the best "feature" to use. To make the mail shorter, here's a simpler example problem: Try to classify a person as e.g. drunk or not given the histogram of their driving speed. In the training phase, we have a table whose rows contain the driver, whether they are drunk, and a sample of driving speed. >From this one can build separate histograms of driving speed for drunk/non drunk. (In my actual application, I have several such histogram features, and they are visibly different; they are also ranked now by some analytic pdf-distance measures such as KL). Now, how to classify... given a single speed, its probability can be evaluated under the two classes, but a single speed sample is not going to be reliable in this problem. Suppose instead that the _distribution_ of speeds is sufficient to discriminate. We have a driver, and a distribution of their speeds over time. A histogram can be built. What to do with this histogram?... Is there a standard classifier that can deal with this situation? My thought(s): - the test histogram could be compared to each of the training histograms with the Chi^2 measure - sum of squared Gaussian deviations, then get a probability from this? - Alternately, consider training histograms with n bins as points in N-dimensional space, use euclidean closeness in this space. This may not generalize to more than one such histogram feature though.... Thanks for any thoughts. (Also thanks for the replies to my recent question about hashtable/dictionary.) ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html