Re: UIMACpp - how to check annotator's dlopen

2012-02-21 Thread Eddie Epstein
This error is coming from libuima, which was successfully loaded by
the JVM, given -Djava.library.path. You're sure that the [appropriate]
LD_LIBRARY_PATH is set correctly for libuima? As a test put the
annotator library in some standard place, like /usr/lib. There's no
issue with 32 vs 64 bit libraries?

Eddie

On Mon, Feb 20, 2012 at 11:54 AM, Sylvain Surcin sur...@kwaga.com wrote:
 Hello,

 Is there a way to get more info, or to trace the processes involved when
 initializing a C++ annotator?

 I have this message dumped when launching an aggregator containing some
 Java annotators and one C++:

 annotator:creating JNILogger

 20/02/12 17:38:13 - 11:
 org.apache.uima.uimacpp.UimacppAnalysisComponent.log(393): GRAVE:
 org.apache.uima.cpp: 0
 Error number  : 2001
 Recoverable   : Yes
 Error         : Error loading annotator 'UnitexAnnotatorCpp'.
 'dlopen(UnitexAnnotatorCpp.dylib, 10): image not found'

 20/02/12 17:38:13 - 11:
 org.apache.uima.uimacpp.UimacppAnalysisComponent.log(393): GRAVE:
 org.apache.uima.cpp: 0 ResourceManager::requestAnnotatorFile() failed to
 find UnitexAnnotatorCpp

 Nevertheless the annotator's UnitexAnnotatorCpp.dylib is well available
 through the $DYLD_LIBRARY_PATH environment variable, and also through the
 -Djava.library.path when launching the aggregate. I even checked with a
 small C program, using dlopen that it is possible to load the library.

 What could I do next?

 Thanks for any advice.
 Sylvain

 --
 Sylvain SURCIN, Ph.D.
 *KWAGA*
 Senior Software Architect
 15, rue Jean-Baptiste Berlier
 75013 Paris
 France
 Tél.: +33 (0)6.32.78.83.31


InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Jens Grivolla

Hi,

it appears that InlineXMLCasConsumer depends on the system locale for 
some internal transformations. The output appears to be written in UTF8 
(outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a 
machine with a locale of ASCII all accented characters get broken.


I suspect that it has to do with the XMLSerializer working on a 
ByteArrayOutputStream, but haven't been able to track it down yet.


Any ideas?

Bye,
Jens



Re: InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Thilo Goetz
On 21/02/12 15:59, Jens Grivolla wrote:
 Hi,
 
 it appears that InlineXMLCasConsumer depends on the system locale for
 some internal transformations. The output appears to be written in UTF8
 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a
 machine with a locale of ASCII all accented characters get broken.
 
 I suspect that it has to do with the XMLSerializer working on a
 ByteArrayOutputStream, but haven't been able to track it down yet.
 
 Any ideas?
 
 Bye,
 Jens
 

Have you checked that it's really the writing end where things
get corrupted, and not the reading end?  Just a thought...

--Thilo



Re: InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Jens Grivolla

On 02/21/2012 04:08 PM, Thilo Goetz wrote:

On 21/02/12 15:59, Jens Grivolla wrote:

it appears that InlineXMLCasConsumer depends on the system locale for
some internal transformations. The output appears to be written in UTF8
(outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a
machine with a locale of ASCII all accented characters get broken.

I suspect that it has to do with the XMLSerializer working on a
ByteArrayOutputStream, but haven't been able to track it down yet.


Have you checked that it's really the writing end where things
get corrupted, and not the reading end?  Just a thought...


Yes, we have an XmiWriterCasConsumer in parallel that works fine.

Jens



Re: InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Thilo Goetz
On 21/02/12 16:15, Jens Grivolla wrote:
 On 02/21/2012 04:08 PM, Thilo Goetz wrote:
 On 21/02/12 15:59, Jens Grivolla wrote:
 it appears that InlineXMLCasConsumer depends on the system locale for
 some internal transformations. The output appears to be written in UTF8
 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a
 machine with a locale of ASCII all accented characters get broken.

 I suspect that it has to do with the XMLSerializer working on a
 ByteArrayOutputStream, but haven't been able to track it down yet.

 Have you checked that it's really the writing end where things
 get corrupted, and not the reading end?  Just a thought...
 
 Yes, we have an XmiWriterCasConsumer in parallel that works fine.
 
 Jens
 

Ah yes, eyeballing the source gives:

  // return XML string
  return new String(byteArrayOutputStream.toByteArray());

This is in CasToInlineXml.java.  I stopped after I found this,
maybe there's more.  Jira, patch, you know the drill :-)

--Thilo


Re: Iterating over all annotation types of a token

2012-02-21 Thread Himanshu Gahlot
Thanks Marshall, this would work for me !

Himanshu Gahlot
Software Engineer
T 415.344.1336
235 2nd St, San Francisco, CA 94105

[image: Cbs]



On Tue, Feb 21, 2012 at 11:09 AM, Marshall Schor m...@schor.com wrote:

 How about something like:

 First, get an iterator over tokens, using the base AnnotationIndex (e.g.
  index = aCAS.getAnnotationIndex(your-**token-type).  Then, I presume you
 will have some kind of a loop that iterates some inner code for each token.
  In the inner code, you want to have an iterator over annotations, that are
 contained within the begin / end span of the token (is that correct?).  You
 can use a subiterator for that.  See http://uima.apache.org/d/**
 uimaj-2.4.0/references.html#**ugr.ref.cas.index.annotation_**indexhttp://uima.apache.org/d/uimaj-2.4.0/references.html#ugr.ref.cas.index.annotation_index

 There is a flag (strict / non-strict) that, if strict says that the
 returned annotation lies within the span of the controlling annotation;
 non-strict says only that the beginning of the returned annotation falls
 within the span.

 Does this help?

 -Marshall


 On 2/21/2012 1:44 PM, Himanshu Gahlot wrote:

 Hi,

 Is there a straightforward way to iterate over all the annotation types of
 a token ? For example, if a token has been annotated as a location,
 organization, food-related entity, subject-of-a-sentence, etc. (in
 addition to being annotated as a token, of course), then is there a way to
 fetch all the annotations for this token without iterating over each
 individual type of annotation and comparing the position of this token in
 every annotation ? Am I missing something obvious ?

 Thanks,
 Himanshu Gahlot
 Software Engineer
 T 415.344.1336
 235 2nd St, San Francisco, CA 94105

 [image: Cbs]