Re: [CODE4LIB] Automatic Content Classification recommendations?

2011-11-28 Thread Jason Stirnaman
ConceptSearch http://www.conceptsearching.com/web/ is a commercial search 
engine and classification tool. Maybe similar to TemaTres, it doesn't use 
machine-learning but extracts "concepts" out of your documents that can be 
mapped to vocabulary terms. The vocabulary is then exposed to the end-user as 
search results facet. It's all driven by MS SQL Server and exposed as web 
services. 
We've used it here to map medical school lectures to the licensing exam 
outlines and have experimented a little with autoclassifying the same lecture 
content by MeSH. 

Jason


Jason Stirnaman
Biomedical Librarian, Digital Projects
A.R. Dykes Library, University of Kansas Medical Center
jstirna...@kumc.edu
913-588-7319


>>> On 11/28/2011 at 12:00 AM, in message 
>>> ,
>>>  Peter Neish  wrote:


Hi there,

Just wondering if anyone has any recommendations for systems that will do
automatic content classification through machine learning? We want to
classify newspaper articles using terms from our existing thesaurus and
have a fairly big set of articles already tagged that could be used as a
training set.. Services like OpenCalais don't really fit our need because
we want to use our own thesaurus. Happy to look at both open source and
commercial software.

Thanks,

Peter

--
Peter Neish
Systems Officer
Victorian Parliamentary Library
Ph: 03 9651 8638
peter.ne...@parliament.vic.gov.au






//////

Parliament of Victoria  
  .
Important Disclaimer Notice:


The information contained in this email  including any attachments, may be
confidential and/or privileged. If you are not the intended recipient, please
notify the sender and delete it from  your system. Any unauthorised
disclosure, copying or dissemination of all or part of this email, including
any attachments, is not permitted. This email, including any attachments, should
be dealt with in accordance with copyright and  privacy legislation.
Except where otherwise stated, views expressed are those of the individual 
sender.


Re: [CODE4LIB] Automatic Content Classification recommendations?

2011-11-28 Thread diego ferreyra
TemaTres Keyword Extactor is tool to  automatic categorization of
texts based on supplied controlled vocabularies. Is a php tool to
extract terms from a text and use it to obtain keywords from a
specific controlled vocabulary. Use the terminological web services
provided by TemaTres.
does not include a learning algorithm .
TemaTres Keyword Extactor does not include a learning machine :( ...
(is a great idea :)) ... use yahoo key extraction service or local
script.

Here can see a demo: http://vocabularyserver.com/distiller/

LTER implementation: http://vocab.lternet.edu/keywordDistiller/

Download:
http://sourceforge.net/projects/tematreskeyword/

best regards

diego ferreyra



2011/11/28 Thomas Krichel :
>  Peter Neish writes
>
>> Just wondering if anyone has any recommendations for systems that will do
>> automatic content classification through machine learning?
>
>  I use LibSVM in AuthorClaim (http://authorclaim.org) and
>  svm_light in NEP (http://nep.repec.org). I found both very helpful.
>  I would switch to LibSVM in NEP since it LibSVM is still
>  actively being developed. Just using a simple binary term
>  weighing scheme and default SVM parameters should get you a
>  long way.
>
>
>  Cheers,
>
>  Thomas Krichel                    http://openlib.org/home/krichel
>                                      http://authorprofile.org/pkr1
>                                               skype: thomaskrichel
>


Re: [CODE4LIB] Automatic Content Classification recommendations?

2011-11-27 Thread Thomas Krichel
  Peter Neish writes

> Just wondering if anyone has any recommendations for systems that will do
> automatic content classification through machine learning?
  
  I use LibSVM in AuthorClaim (http://authorclaim.org) and 
  svm_light in NEP (http://nep.repec.org). I found both very helpful.
  I would switch to LibSVM in NEP since it LibSVM is still 
  actively being developed. Just using a simple binary term 
  weighing scheme and default SVM parameters should get you a 
  long way.


  Cheers,

  Thomas Krichelhttp://openlib.org/home/krichel
  http://authorprofile.org/pkr1
   skype: thomaskrichel


[CODE4LIB] Automatic Content Classification recommendations?

2011-11-27 Thread Peter Neish
Hi there,

Just wondering if anyone has any recommendations for systems that will do
automatic content classification through machine learning? We want to
classify newspaper articles using terms from our existing thesaurus and
have a fairly big set of articles already tagged that could be used as a
training set.. Services like OpenCalais don't really fit our need because
we want to use our own thesaurus. Happy to look at both open source and
commercial software.

Thanks,

Peter

--
Peter Neish
Systems Officer
Victorian Parliamentary Library
Ph: 03 9651 8638
peter.ne...@parliament.vic.gov.au






//////

Parliament of Victoria  
  . 
Important Disclaimer Notice:


The information contained in this email  including any attachments, may be 
confidential and/or privileged. If you are not the intended recipient, please 
notify the sender and delete it from  your system. Any unauthorised 
disclosure, copying or dissemination of all or part of this email, including 
any attachments, is not permitted. This email, including any attachments, 
should 
be dealt with in accordance with copyright and  privacy legislation. 
Except where otherwise stated, views expressed are those of the individual 
sender.