Hello, I don't have any formal education on predictive models, thus I hope my questions are not too naive and that the terminology I use is correct enough to make me understood.
I'm trying to implement simple text categorization of phrases of a few words (the specific application is categorization of bank transaction from payee names). Following the documentation I easily implemented a solution based on the TF-IDF vectorizer and C-Support Vector machine classification. However, the problem is such that for some input phrases the classification prediction does not work that well. I have a couple of (probably very basic questions): - are my choices of algorithms the best to target this problem? Is there something else I can try to experiment with to see if I can get better results? - is there a way to obtain the prediction likelihood such that I could mark "bad" prediction for further inspection? I haven't found an (easy) way to do that in the documentation. Thank you in advance. Cheers, Dan _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn