In my day job I train text classifiers that are useful for a wide variety of health surveillance tasks. The data used to train these classifiers however cannot be shared because of confidentiality protections. I would like to make these trained models available to others just as cTAKES does, but I'm not sure how. Can you tell me how cTAKES does it, or point me to resources that might be useful?
My models tend to be regularized logistic regression models trained on bag-of-words type features. I suspect that I can get some protection by hashing everything to a fixed space first, but if there's a different well-established approach out there I'd rather use that. Alex Measure
