Re: [CODE4LIB] Automatic Content Classification recommendations?
ConceptSearch http://www.conceptsearching.com/web/ is a commercial search engine and classification tool. Maybe similar to TemaTres, it doesn't use machine-learning but extracts "concepts" out of your documents that can be mapped to vocabulary terms. The vocabulary is then exposed to the end-user as search results facet. It's all driven by MS SQL Server and exposed as web services. We've used it here to map medical school lectures to the licensing exam outlines and have experimented a little with autoclassifying the same lecture content by MeSH. Jason Jason Stirnaman Biomedical Librarian, Digital Projects A.R. Dykes Library, University of Kansas Medical Center jstirna...@kumc.edu 913-588-7319 >>> On 11/28/2011 at 12:00 AM, in message >>> , >>> Peter Neish wrote: Hi there, Just wondering if anyone has any recommendations for systems that will do automatic content classification through machine learning? We want to classify newspaper articles using terms from our existing thesaurus and have a fairly big set of articles already tagged that could be used as a training set.. Services like OpenCalais don't really fit our need because we want to use our own thesaurus. Happy to look at both open source and commercial software. Thanks, Peter -- Peter Neish Systems Officer Victorian Parliamentary Library Ph: 03 9651 8638 peter.ne...@parliament.vic.gov.au ////// Parliament of Victoria . Important Disclaimer Notice: The information contained in this email including any attachments, may be confidential and/or privileged. If you are not the intended recipient, please notify the sender and delete it from your system. Any unauthorised disclosure, copying or dissemination of all or part of this email, including any attachments, is not permitted. This email, including any attachments, should be dealt with in accordance with copyright and privacy legislation. Except where otherwise stated, views expressed are those of the individual sender.
Re: [CODE4LIB] Automatic Content Classification recommendations?
TemaTres Keyword Extactor is tool to automatic categorization of texts based on supplied controlled vocabularies. Is a php tool to extract terms from a text and use it to obtain keywords from a specific controlled vocabulary. Use the terminological web services provided by TemaTres. does not include a learning algorithm . TemaTres Keyword Extactor does not include a learning machine :( ... (is a great idea :)) ... use yahoo key extraction service or local script. Here can see a demo: http://vocabularyserver.com/distiller/ LTER implementation: http://vocab.lternet.edu/keywordDistiller/ Download: http://sourceforge.net/projects/tematreskeyword/ best regards diego ferreyra 2011/11/28 Thomas Krichel : > Peter Neish writes > >> Just wondering if anyone has any recommendations for systems that will do >> automatic content classification through machine learning? > > I use LibSVM in AuthorClaim (http://authorclaim.org) and > svm_light in NEP (http://nep.repec.org). I found both very helpful. > I would switch to LibSVM in NEP since it LibSVM is still > actively being developed. Just using a simple binary term > weighing scheme and default SVM parameters should get you a > long way. > > > Cheers, > > Thomas Krichel http://openlib.org/home/krichel > http://authorprofile.org/pkr1 > skype: thomaskrichel >
Re: [CODE4LIB] Automatic Content Classification recommendations?
Peter Neish writes > Just wondering if anyone has any recommendations for systems that will do > automatic content classification through machine learning? I use LibSVM in AuthorClaim (http://authorclaim.org) and svm_light in NEP (http://nep.repec.org). I found both very helpful. I would switch to LibSVM in NEP since it LibSVM is still actively being developed. Just using a simple binary term weighing scheme and default SVM parameters should get you a long way. Cheers, Thomas Krichelhttp://openlib.org/home/krichel http://authorprofile.org/pkr1 skype: thomaskrichel
[CODE4LIB] Automatic Content Classification recommendations?
Hi there, Just wondering if anyone has any recommendations for systems that will do automatic content classification through machine learning? We want to classify newspaper articles using terms from our existing thesaurus and have a fairly big set of articles already tagged that could be used as a training set.. Services like OpenCalais don't really fit our need because we want to use our own thesaurus. Happy to look at both open source and commercial software. Thanks, Peter -- Peter Neish Systems Officer Victorian Parliamentary Library Ph: 03 9651 8638 peter.ne...@parliament.vic.gov.au ////// Parliament of Victoria . Important Disclaimer Notice: The information contained in this email including any attachments, may be confidential and/or privileged. If you are not the intended recipient, please notify the sender and delete it from your system. Any unauthorised disclosure, copying or dissemination of all or part of this email, including any attachments, is not permitted. This email, including any attachments, should be dealt with in accordance with copyright and privacy legislation. Except where otherwise stated, views expressed are those of the individual sender.