Re: UIMACpp - how to check annotator's dlopen
This error is coming from libuima, which was successfully loaded by the JVM, given -Djava.library.path. You're sure that the [appropriate] LD_LIBRARY_PATH is set correctly for libuima? As a test put the annotator library in some standard place, like /usr/lib. There's no issue with 32 vs 64 bit libraries? Eddie On Mon, Feb 20, 2012 at 11:54 AM, Sylvain Surcin sur...@kwaga.com wrote: Hello, Is there a way to get more info, or to trace the processes involved when initializing a C++ annotator? I have this message dumped when launching an aggregator containing some Java annotators and one C++: annotator:creating JNILogger 20/02/12 17:38:13 - 11: org.apache.uima.uimacpp.UimacppAnalysisComponent.log(393): GRAVE: org.apache.uima.cpp: 0 Error number : 2001 Recoverable : Yes Error : Error loading annotator 'UnitexAnnotatorCpp'. 'dlopen(UnitexAnnotatorCpp.dylib, 10): image not found' 20/02/12 17:38:13 - 11: org.apache.uima.uimacpp.UimacppAnalysisComponent.log(393): GRAVE: org.apache.uima.cpp: 0 ResourceManager::requestAnnotatorFile() failed to find UnitexAnnotatorCpp Nevertheless the annotator's UnitexAnnotatorCpp.dylib is well available through the $DYLD_LIBRARY_PATH environment variable, and also through the -Djava.library.path when launching the aggregate. I even checked with a small C program, using dlopen that it is possible to load the library. What could I do next? Thanks for any advice. Sylvain -- Sylvain SURCIN, Ph.D. *KWAGA* Senior Software Architect 15, rue Jean-Baptiste Berlier 75013 Paris France Tél.: +33 (0)6.32.78.83.31
InlineXMLCasConsumer fails depending on locale
Hi, it appears that InlineXMLCasConsumer depends on the system locale for some internal transformations. The output appears to be written in UTF8 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a machine with a locale of ASCII all accented characters get broken. I suspect that it has to do with the XMLSerializer working on a ByteArrayOutputStream, but haven't been able to track it down yet. Any ideas? Bye, Jens
Re: InlineXMLCasConsumer fails depending on locale
On 21/02/12 15:59, Jens Grivolla wrote: Hi, it appears that InlineXMLCasConsumer depends on the system locale for some internal transformations. The output appears to be written in UTF8 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a machine with a locale of ASCII all accented characters get broken. I suspect that it has to do with the XMLSerializer working on a ByteArrayOutputStream, but haven't been able to track it down yet. Any ideas? Bye, Jens Have you checked that it's really the writing end where things get corrupted, and not the reading end? Just a thought... --Thilo
Re: InlineXMLCasConsumer fails depending on locale
On 02/21/2012 04:08 PM, Thilo Goetz wrote: On 21/02/12 15:59, Jens Grivolla wrote: it appears that InlineXMLCasConsumer depends on the system locale for some internal transformations. The output appears to be written in UTF8 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a machine with a locale of ASCII all accented characters get broken. I suspect that it has to do with the XMLSerializer working on a ByteArrayOutputStream, but haven't been able to track it down yet. Have you checked that it's really the writing end where things get corrupted, and not the reading end? Just a thought... Yes, we have an XmiWriterCasConsumer in parallel that works fine. Jens
Re: InlineXMLCasConsumer fails depending on locale
On 21/02/12 16:15, Jens Grivolla wrote: On 02/21/2012 04:08 PM, Thilo Goetz wrote: On 21/02/12 15:59, Jens Grivolla wrote: it appears that InlineXMLCasConsumer depends on the system locale for some internal transformations. The output appears to be written in UTF8 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a machine with a locale of ASCII all accented characters get broken. I suspect that it has to do with the XMLSerializer working on a ByteArrayOutputStream, but haven't been able to track it down yet. Have you checked that it's really the writing end where things get corrupted, and not the reading end? Just a thought... Yes, we have an XmiWriterCasConsumer in parallel that works fine. Jens Ah yes, eyeballing the source gives: // return XML string return new String(byteArrayOutputStream.toByteArray()); This is in CasToInlineXml.java. I stopped after I found this, maybe there's more. Jira, patch, you know the drill :-) --Thilo
Re: Iterating over all annotation types of a token
Thanks Marshall, this would work for me ! Himanshu Gahlot Software Engineer T 415.344.1336 235 2nd St, San Francisco, CA 94105 [image: Cbs] On Tue, Feb 21, 2012 at 11:09 AM, Marshall Schor m...@schor.com wrote: How about something like: First, get an iterator over tokens, using the base AnnotationIndex (e.g. index = aCAS.getAnnotationIndex(your-**token-type). Then, I presume you will have some kind of a loop that iterates some inner code for each token. In the inner code, you want to have an iterator over annotations, that are contained within the begin / end span of the token (is that correct?). You can use a subiterator for that. See http://uima.apache.org/d/** uimaj-2.4.0/references.html#**ugr.ref.cas.index.annotation_**indexhttp://uima.apache.org/d/uimaj-2.4.0/references.html#ugr.ref.cas.index.annotation_index There is a flag (strict / non-strict) that, if strict says that the returned annotation lies within the span of the controlling annotation; non-strict says only that the beginning of the returned annotation falls within the span. Does this help? -Marshall On 2/21/2012 1:44 PM, Himanshu Gahlot wrote: Hi, Is there a straightforward way to iterate over all the annotation types of a token ? For example, if a token has been annotated as a location, organization, food-related entity, subject-of-a-sentence, etc. (in addition to being annotated as a token, of course), then is there a way to fetch all the annotations for this token without iterating over each individual type of annotation and comparing the position of this token in every annotation ? Am I missing something obvious ? Thanks, Himanshu Gahlot Software Engineer T 415.344.1336 235 2nd St, San Francisco, CA 94105 [image: Cbs]