2010/11/26 Jörn Kottmann <kottm...@gmail.com> > On 11/26/10 9:35 AM, Tommaso Teofili wrote: > >> Hi all, >> following Burn's proposal for multimodal analysis component skeleton I >> also >> have a couple of components to propose for inclusion inside the sandbox: >> >> - Solr CAS Consumer - to consume CAS/types/features inside Solr fields. >> This could be put inside Lucas or in a separate project >> > > As far as I know is the main difference from a configuration point of view, > is > that Lucas defines the language analyzers inside the AE configuration > and Solr defines them in a server side xml configuration file. > In the end there might be not much which could be reused from Lucas. >
Only I thought to Lucas because Lucene and Solr are so close that at a high level it could make sense to have them inside the same component, but I agree that from a functional level they are quite different > > Lucas is not maintained right now, and I guess that is because most > people are not interested in creating a Lucene index from a bunch of > documents. > I heard of someone using it, if I can find time to do it I will try to maintain and update it to latest Lucene (or maybe at a 2.9.3 which is backward compatible but still has some 3.x major improvements). > > The way we use UIMA is to process a stream of documents which are > received continuously, in this model a Solr AE fits really nicely, because > it just send the received documents to a Solr server which adds it > to the index. After a document is received it can be search with a > short delay. With Lucas that would not be possible. > > I actually created a small Solr AE for doing a quick semantic search demo. > One problem I did run in is that the Solr AE really slows down my > processing > pipeline. Anyway I would be happy to test your implementation and > contribute > to it. > Nice! Thanks, looking forward to cooperate on it. > > - a Simple Language Annotator - to extract language from document text, >> >> this one can use 3 algorithms: >> - Tika 0.8 language identification capability >> - Alchemy language annotator >> - Dictionaries of stopwords for each language >> >> We could easily add AEs which set the language to the Tika and > Alchemy project we already have. It can also be done with OpenNLP. > It's a good idea, so the third algorithm could be put inside the DictionaryAnnotator. Cheers, Tommaso > > Jörn >