date:20120221

Re: UIMACpp - how to check annotator's dlopen

2012-02-21 Thread Eddie Epstein

This error is coming from libuima, which was successfully loaded by
the JVM, given -Djava.library.path. You're sure that the [appropriate]
LD_LIBRARY_PATH is set correctly for libuima? As a test put the
annotator library in some standard place, like /usr/lib. There's no
issue with 32 vs 64 bit libraries?

Eddie

On Mon, Feb 20, 2012 at 11:54 AM, Sylvain Surcin sur...@kwaga.com wrote:
 Hello,

 Is there a way to get more info, or to trace the processes involved when
 initializing a C++ annotator?

 I have this message dumped when launching an aggregator containing some
 Java annotators and one C++:

 annotator:creating JNILogger

 20/02/12 17:38:13 - 11:
 org.apache.uima.uimacpp.UimacppAnalysisComponent.log(393): GRAVE:
 org.apache.uima.cpp: 0
 Error number  : 2001
 Recoverable   : Yes
 Error         : Error loading annotator 'UnitexAnnotatorCpp'.
 'dlopen(UnitexAnnotatorCpp.dylib, 10): image not found'

 20/02/12 17:38:13 - 11:
 org.apache.uima.uimacpp.UimacppAnalysisComponent.log(393): GRAVE:
 org.apache.uima.cpp: 0 ResourceManager::requestAnnotatorFile() failed to
 find UnitexAnnotatorCpp

 Nevertheless the annotator's UnitexAnnotatorCpp.dylib is well available
 through the $DYLD_LIBRARY_PATH environment variable, and also through the
 -Djava.library.path when launching the aggregate. I even checked with a
 small C program, using dlopen that it is possible to load the library.

 What could I do next?

 Thanks for any advice.
 Sylvain

 --
 Sylvain SURCIN, Ph.D.
 *KWAGA*
 Senior Software Architect
 15, rue Jean-Baptiste Berlier
 75013 Paris
 France
 Tél.: +33 (0)6.32.78.83.31

InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Jens Grivolla


Hi,

it appears that InlineXMLCasConsumer depends on the system locale for 
some internal transformations. The output appears to be written in UTF8 
(outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a 
machine with a locale of ASCII all accented characters get broken.


I suspect that it has to do with the XMLSerializer working on a 
ByteArrayOutputStream, but haven't been able to track it down yet.


Any ideas?

Bye,
Jens

Re: InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Thilo Goetz

On 21/02/12 15:59, Jens Grivolla wrote:
 Hi,
 
 it appears that InlineXMLCasConsumer depends on the system locale for
 some internal transformations. The output appears to be written in UTF8
 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a
 machine with a locale of ASCII all accented characters get broken.
 
 I suspect that it has to do with the XMLSerializer working on a
 ByteArrayOutputStream, but haven't been able to track it down yet.
 
 Any ideas?
 
 Bye,
 Jens
 

Have you checked that it's really the writing end where things
get corrupted, and not the reading end?  Just a thought...

--Thilo

Re: InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Jens Grivolla


On 02/21/2012 04:08 PM, Thilo Goetz wrote:

On 21/02/12 15:59, Jens Grivolla wrote:

it appears that InlineXMLCasConsumer depends on the system locale for
some internal transformations. The output appears to be written in UTF8
(outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a
machine with a locale of ASCII all accented characters get broken.

I suspect that it has to do with the XMLSerializer working on a
ByteArrayOutputStream, but haven't been able to track it down yet.


Have you checked that it's really the writing end where things
get corrupted, and not the reading end?  Just a thought...


Yes, we have an XmiWriterCasConsumer in parallel that works fine.

Jens

Re: InlineXMLCasConsumer fails depending on locale

2012-02-21 Thread Thilo Goetz

On 21/02/12 16:15, Jens Grivolla wrote:
 On 02/21/2012 04:08 PM, Thilo Goetz wrote:
 On 21/02/12 15:59, Jens Grivolla wrote:
 it appears that InlineXMLCasConsumer depends on the system locale for
 some internal transformations. The output appears to be written in UTF8
 (outStream.write(xmlAnnotations.getBytes(UTF-8));) but when used on a
 machine with a locale of ASCII all accented characters get broken.

 I suspect that it has to do with the XMLSerializer working on a
 ByteArrayOutputStream, but haven't been able to track it down yet.

 Have you checked that it's really the writing end where things
 get corrupted, and not the reading end?  Just a thought...
 
 Yes, we have an XmiWriterCasConsumer in parallel that works fine.
 
 Jens
 

Ah yes, eyeballing the source gives:

  // return XML string
  return new String(byteArrayOutputStream.toByteArray());

This is in CasToInlineXml.java.  I stopped after I found this,
maybe there's more.  Jira, patch, you know the drill :-)

--Thilo

Re: Iterating over all annotation types of a token

2012-02-21 Thread Himanshu Gahlot

Thanks Marshall, this would work for me !

Himanshu Gahlot
Software Engineer
T 415.344.1336
235 2nd St, San Francisco, CA 94105

[image: Cbs]

On Tue, Feb 21, 2012 at 11:09 AM, Marshall Schor m...@schor.com wrote:

How about something like:

First, get an iterator over tokens, using the base AnnotationIndex (e.g.
index = aCAS.getAnnotationIndex(your-**token-type). Then, I presume you
will have some kind of a loop that iterates some inner code for each token.
In the inner code, you want to have an iterator over annotations, that are
contained within the begin / end span of the token (is that correct?). You
can use a subiterator for that. See http://uima.apache.org/d/**
uimaj-2.4.0/references.html#**ugr.ref.cas.index.annotation_**indexhttp://uima.apache.org/d/uimaj-2.4.0/references.html#ugr.ref.cas.index.annotation_index

There is a flag (strict / non-strict) that, if strict says that the
returned annotation lies within the span of the controlling annotation;
non-strict says only that the beginning of the returned annotation falls
within the span.

Does this help?

-Marshall

On 2/21/2012 1:44 PM, Himanshu Gahlot wrote:

Hi,

Is there a straightforward way to iterate over all the annotation types of
a token ? For example, if a token has been annotated as a location,
organization, food-related entity, subject-of-a-sentence, etc. (in
addition to being annotated as a token, of course), then is there a way to
fetch all the annotations for this token without iterating over each
individual type of annotation and comparing the position of this token in
every annotation ? Am I missing something obvious ?

Thanks,
Himanshu Gahlot
Software Engineer
T 415.344.1336
235 2nd St, San Francisco, CA 94105

[image: Cbs]

Re: UIMACpp - how to check annotator's dlopen

InlineXMLCasConsumer fails depending on locale

Re: InlineXMLCasConsumer fails depending on locale

Re: InlineXMLCasConsumer fails depending on locale

Re: InlineXMLCasConsumer fails depending on locale

Re: Iterating over all annotation types of a token

6 matches

Site Navigation

Mail list logo

Footer information