Hi Wayne,
On Thu, Sep 27, 2012 at 6:17 PM, Wayne Rasmuss <wayne.rasm...@perceptivesoftware.com> wrote: > I've been working with UIMA and OpenNLP together. Basically I've got the > OpenNLP/UIMA example working. This gives me annotated text with tokens, > sentences, parts of speech, chunks (verb phrase, noun phrase, etc.) It also > attempts organizations, dates and locations though I don't get reliable > results with them. Mostly I'm interested in parts of speech and chunks > anyway. > Word level NLP annotations are currently not included in the Enhancement Results. This is mainly because this would result in 20+ triples per word. However with STANBOL-733 "Stanbol NLP processing" this feature will be added. Development of this is done in an own branch [1]. This branch also includes an own Stanbol Launcher that allows to easily test the current state of development (build and start the launcher and than post some text to the http://localhost:8080/enhancer/chain/nlp-processing) I will give you a short overview. Details can be found in JIRA: * AnalysedText: Java Domain Model that represents results of NLP. The AnalysedText is added to the ContentItem as ContentPart (see STANBOL-734 for code examples) * NLP 2 RDF: This is an EnhancementEngine that converts the information of the AnalysedText to RDF by using NIF (NLP Interchange Format) - a set of OWL ontologies that allow to formally represent NLP results (see STANBOL-741). NOTE that the NLP results provided by the nlp-processing chain of the Stanbol Launcher do already use NIF * The opennlp.pos EnhancementEngine supports POS tagging of parsed texts in all languages supported by openNLP (STANBOL-735). As part of that is also detects and adds Sentence annotations. The opennlp.chunker EnhancementEngine consumes Tokens and POS tags and performs chunking (STANBOL-736). Chunking is supported for English and German. There is also a sentiment.wordclassifier EnhancementEngine that adds sentiment tags on word level (based on SentiWordNet in English and SentiWS for German). You might also have a look at a presentation [2] about the Stanbol NLP processing module I gave at the MOLDE workshop this week in Leipzig. [1] http://svn.apache.org/repos/asf/stanbol/branches/stanbol-nlp-processing/ [2] http://stanbol.apache.org/presentations/Stanbol_NLP_processing_2012-09.pdf > I've been looking around and Stanbol looks like it may be easier to deal > with and give me more advanced capabilities. I've done the first part of > the getting started guide, but not the "full" version. I got he web > interface up and was able to get some enhanced text. So that was great. > > After that I'm kind of stumped. I would like to get the annotated text > (like I'm getting from UIMA/OpenNLP) so we can do analysis on it. Can > someone help get started with setting up/calling stanbol so I can get the > details in the enhanced result? > If you want to stay with the RESTful service you will need to implement against the NIF as generated by the "NLP2RDF" engine. If you plan to access the StanbolEnhancer via its Java API I think that the API of the AnalyzedText (STANBOL-734) should give you everything you need. You might also want to consider to implement your own analysis as Stanbol EnhancementEngine. This blog [3] provides a good introduction on how to do that. [3] http://blog.iks-project.eu/getting-started-with-apache-stanbol-enhancement-engine/ > > We're working with Groovy as our glue code. Bertrand provided me with this > example.https://gist.github.com/2931050 which looks very promising, I think > what I need to do is basically add OpenNLP enhancers here and figure out > how to call it. > The "opennlp.pos" and "opennlp.cunker" Engines should exactly provide the information you are looking for. AFAIK the Apache Camel example provided by Bertrand should allow you to call the according Engines/Chain and also support direct access to the results stored in the AnalyzedText content part. But as I am not familiar with Camel it would be good if Bertrand could confirm this. Please NOTE that the Stanbol NLP processing is still in heavy development. So things might still change. The current plan is to have a first rather stable version of STANBOL-733 available in the trunk by end of October. best Rupert -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen