Re: Sorting overlapping annotation of same type using UIMAFIT

2016-11-21 Thread William Colen
Thank you, Marshall.
What if they are of the same type?
The workaround for me was to add a feature I can store a integer which I
use to sort the annotations. It is not a good approach because the user
will need to remember to sort it before using.

Thank you
William

2016-11-21 20:10 GMT-02:00 Marshall Schor :

> The select form you're using iterates using UIMA's built-in Annotation
> index.
> This index is sorting the annotations based on 3 criteria:
>
> 1) the begin (ascending order)
>
> 2) the end (descending order)
>
> 3) the type priority
>
> You can use the 3rd criterion to set a preference ordering among two
> annotations
> of different types, which have the same begin / end.
> You specify the type priorities as part of Analysis Engine metadata, see
> http://uima.apache.org/d/uimaj-current/references.html#
> ugr.ref.xml.component_descriptor.aes.primitive
>
> -Marshall
>
> On 11/20/2016 9:52 PM, William Colen wrote:
> > Hi,
> >
> > In Portuguese we have contractions, that are words composed by, for
> > example, a preposition + article, pronoun or an adverb.
> >
> > Example:
> >
> > Nós acreditávamos nele. (We believed him.)
> >
> > Where "nele" can be divided into "em" + "ele". (in + him)
> >
> > To properly analyze this, I created two token annotation with the same
> > begin and end, but the first I associated with the POS Tag preposition,
> and
> > the second pronoun.
> >
> > This is especially important when we are doing chunking, because the
> first
> > token will be part of a prepositional phrase, while the second of a
> nominal
> > phrase.
> >
> > How can I guarantee that when I call UIMAFit JCasUtil.select I will get
> the
> > tokens ordered, first the preposition, second the pronoun?
> >
> > Thank you,
> > William
> >
>
>


Re: Sorting overlapping annotation of same type using UIMAFIT

2016-11-21 Thread Marshall Schor
The select form you're using iterates using UIMA's built-in Annotation index. 
This index is sorting the annotations based on 3 criteria: 

1) the begin (ascending order)

2) the end (descending order)

3) the type priority

You can use the 3rd criterion to set a preference ordering among two annotations
of different types, which have the same begin / end.
You specify the type priorities as part of Analysis Engine metadata, see
http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive

-Marshall

On 11/20/2016 9:52 PM, William Colen wrote:
> Hi,
>
> In Portuguese we have contractions, that are words composed by, for
> example, a preposition + article, pronoun or an adverb.
>
> Example:
>
> Nós acreditávamos nele. (We believed him.)
>
> Where "nele" can be divided into "em" + "ele". (in + him)
>
> To properly analyze this, I created two token annotation with the same
> begin and end, but the first I associated with the POS Tag preposition, and
> the second pronoun.
>
> This is especially important when we are doing chunking, because the first
> token will be part of a prepositional phrase, while the second of a nominal
> phrase.
>
> How can I guarantee that when I call UIMAFit JCasUtil.select I will get the
> tokens ordered, first the preposition, second the pronoun?
>
> Thank you,
> William
>



Re: No service reply, after org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0 character:

2016-11-21 Thread Jaroslaw Cwiklik
Nelson, a fix for this is part of JIRA UIMA-5189 which addresses error
handing when a serializer throws an exception.
I will post UIMA-AS 2.9.0 release candidate tomorrow so you can test your
use case. Watch for an email on uima dev list.
Jerry

On Mon, Nov 21, 2016 at 4:17 PM, nelson rivera 
wrote:

> I tried to process a input cas in service aggregate deployed in
> uima-as. The annotations produced for annotators contains apparently
> invalid character, after finalize the processing , when the framework
> tries to send the reply, shows a org.xml.sax.SAXParseException error
> serializing the cas and in the client side i get not any reply, the
> listener associate it is not notified of the error, and the client
> program stays waiting
>
> the log of service aggregate error:
>
> 03:50:03.578 - 22:
> org.apache.uima.aae.controller.AggregateAnalysisEngineControl
> ler_impl.replyToClient:
> WARNING: Service: XDataFileExtractorAggregate Runtime Exception
> 03:50:03.579 - 22:
> org.apache.uima.aae.controller.AggregateAnalysisEngineControl
> ler_impl.replyToClient:
> WARNING:
> org.apache.uima.aae.error.AsynchAEException:
> org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0
> character: , 0x1 at offset 0 in string starting with
> at org.apache.uima.adapter.jms.activemq.JmsOutputChannel.
> getSerializedCas(JmsOutputChannel.java:1258)
> at org.apache.uima.adapter.jms.activemq.JmsOutputChannel.
> sendReply(JmsOutputChannel.java:793)
> at org.apache.uima.aae.controller.AggregateAnalysisEngineControl
> ler_impl.sendReplyToRemoteClient(AggregateAnalysisEngineControl
> ler_impl.java:2166)
> at org.apache.uima.aae.controller.AggregateAnalysisEngineControl
> ler_impl.replyToClient(AggregateAnalysisEngineController_impl.java:2335)
> at org.apache.uima.aae.controller.AggregateAnalysisEngineControl
> ler_impl.finalStep(AggregateAnalysisEngineController_impl.java:1855)
> at org.apache.uima.aae.controller.AggregateAnalysisEngineControl
> ler_impl.executeFlowStep(AggregateAnalysisEngineController_impl.java:2482)
> at org.apache.uima.aae.controller.AggregateAnalysisEngineControl
> ler_impl.process(AggregateAnalysisEngineController_impl.java:1264)
> at org.apache.uima.aae.handler.HandlerBase.invokeProcess(
> HandlerBase.java:118)
> at org.apache.uima.aae.handler.input.ProcessResponseHandler.
> cancelTimerAndProcess(ProcessResponseHandler.java:117)
> at org.apache.uima.aae.handler.input.ProcessResponseHandler.
> handleProcessResponseWithCASReference(ProcessResponseHandler.java:485)
> at org.apache.uima.aae.handler.input.ProcessResponseHandler.
> handle(ProcessResponseHandler.java:767)
> at org.apache.uima.aae.handler.HandlerBase.delegate(
> HandlerBase.java:149)
> at org.apache.uima.aae.handler.input.ProcessRequestHandler_
> impl.handle(ProcessRequestHandler_impl.java:1085)
> at org.apache.uima.aae.spi.transport.vm.UimaVmMessageListener.
> onMessage(UimaVmMessageListener.java:107)
> at org.apache.uima.aae.spi.transport.vm.
> UimaVmMessageDispatcher$1.run(UimaVmMessageDispatcher.java:70)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at org.apache.uima.aae.UimaAsThreadFactory$1.run(
> UimaAsThreadFactory.java:132)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.xml.sax.SAXParseException; Trying to serialize non-XML
> 1.0 character: , 0x1 at offset 0 in string starting with
> at org.apache.uima.util.XMLSerializer$
> CharacterValidatingContentHandler.checkForInvalidXmlChars(
> XMLSerializer.java:374)
> at org.apache.uima.util.XMLSerializer$
> CharacterValidatingContentHandler.startElement(XMLSerializer.java:275)
> at org.apache.uima.cas.impl.XmiCasSerializer$
> XmiDocSerializer.startElement(XmiCasSerializer.java:1197)
> at org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.
> writeFsOrLists(XmiCasSerializer.java:711)
> at org.apache.uima.cas.impl.XmiCasSerializer$
> XmiDocSerializer.writeFs(XmiCasSerializer.java:697)
> at org.apache.uima.cas.impl.CasSerializerSupport$
> CasDocSerializer.encodeFS(CasSerializerSupport.java:)
> at org.apache.uima.cas.impl.CasSerializerSupport$
> CasDocSerializer.encodeQueued(CasSerializerSupport.java:1015)
> at org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.
> writeFeatureStructures(XmiCasSerializer.java:563)
> at org.apache.uima.cas.impl.CasSerializerSupport$
> CasDocSerializer.serialize(CasSerializerSupport.java:439)
> at org.apache.uima.cas.impl.XmiCasSerializer.serialize(
> XmiCasSerializer.java:415)
> at org.apache.uima.cas.impl.XmiCasSerializer.serialize(
> XmiCasSerializer.java:385)
> at org.apache.uima.aae.UimaSerializer.serializeCasToXmi(
> UimaSerializer.java:145)
> 

No service reply, after org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0 character:

2016-11-21 Thread nelson rivera
I tried to process a input cas in service aggregate deployed in
uima-as. The annotations produced for annotators contains apparently
invalid character, after finalize the processing , when the framework
tries to send the reply, shows a org.xml.sax.SAXParseException error
serializing the cas and in the client side i get not any reply, the
listener associate it is not notified of the error, and the client
program stays waiting

the log of service aggregate error:

03:50:03.578 - 22:
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.replyToClient:
WARNING: Service: XDataFileExtractorAggregate Runtime Exception
03:50:03.579 - 22:
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.replyToClient:
WARNING:
org.apache.uima.aae.error.AsynchAEException:
org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0
character: , 0x1 at offset 0 in string starting with
at 
org.apache.uima.adapter.jms.activemq.JmsOutputChannel.getSerializedCas(JmsOutputChannel.java:1258)
at 
org.apache.uima.adapter.jms.activemq.JmsOutputChannel.sendReply(JmsOutputChannel.java:793)
at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.sendReplyToRemoteClient(AggregateAnalysisEngineController_impl.java:2166)
at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.replyToClient(AggregateAnalysisEngineController_impl.java:2335)
at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.finalStep(AggregateAnalysisEngineController_impl.java:1855)
at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.executeFlowStep(AggregateAnalysisEngineController_impl.java:2482)
at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.process(AggregateAnalysisEngineController_impl.java:1264)
at 
org.apache.uima.aae.handler.HandlerBase.invokeProcess(HandlerBase.java:118)
at 
org.apache.uima.aae.handler.input.ProcessResponseHandler.cancelTimerAndProcess(ProcessResponseHandler.java:117)
at 
org.apache.uima.aae.handler.input.ProcessResponseHandler.handleProcessResponseWithCASReference(ProcessResponseHandler.java:485)
at 
org.apache.uima.aae.handler.input.ProcessResponseHandler.handle(ProcessResponseHandler.java:767)
at 
org.apache.uima.aae.handler.HandlerBase.delegate(HandlerBase.java:149)
at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:1085)
at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageListener.onMessage(UimaVmMessageListener.java:107)
at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageDispatcher$1.run(UimaVmMessageDispatcher.java:70)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.uima.aae.UimaAsThreadFactory$1.run(UimaAsThreadFactory.java:132)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.xml.sax.SAXParseException; Trying to serialize non-XML
1.0 character: , 0x1 at offset 0 in string starting with
at 
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:374)
at 
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:275)
at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.startElement(XmiCasSerializer.java:1197)
at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.writeFsOrLists(XmiCasSerializer.java:711)
at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.writeFs(XmiCasSerializer.java:697)
at 
org.apache.uima.cas.impl.CasSerializerSupport$CasDocSerializer.encodeFS(CasSerializerSupport.java:)
at 
org.apache.uima.cas.impl.CasSerializerSupport$CasDocSerializer.encodeQueued(CasSerializerSupport.java:1015)
at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.writeFeatureStructures(XmiCasSerializer.java:563)
at 
org.apache.uima.cas.impl.CasSerializerSupport$CasDocSerializer.serialize(CasSerializerSupport.java:439)
at 
org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:415)
at 
org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:385)
at 
org.apache.uima.aae.UimaSerializer.serializeCasToXmi(UimaSerializer.java:145)
at 
org.apache.uima.adapter.jms.activemq.JmsOutputChannel.serializeCAS(JmsOutputChannel.java:244)
at 
org.apache.uima.adapter.jms.activemq.JmsOutputChannel.getSerializedCas(JmsOutputChannel.java:1243)
... 18 more


Re: TERMINATE Action with org.xml.sax.SAXParseException in deserializeCasFromXmi function

2016-11-21 Thread Jaroslaw Cwiklik
Nelson, I've created a JIRA for this bug:
https://issues.apache.org/jira/browse/UIMA-5189

This will be fixed soon and will be part of the next UIMA-AS release
(2.9.0).

Thanks for finding the bug.
Jerry

On Fri, Nov 18, 2016 at 3:39 PM, Jaroslaw Cwiklik  wrote:

> Hi, looks like a bug. Will take a look on Monday.
> Thanks
> Jerry
>
> On Fri, Nov 18, 2016 at 11:12 AM, nelson rivera 
> wrote:
>
>> I have a service aggregate deploys in uima-as. When i send a input cas
>> with a text that contains apparently invalid character, occurs an
>> error deserializing the cas and the framework stops the aggregate
>> service
>>
>> this is the complete stacktrace:
>>
>> 09:54:38.24 - 1:
>> org.apache.uima.adapter.jms.activemq.SpringContainerDeployer
>> .doStartListeners:
>> INFO: Controller: XTokenizerAggregate Trying to Start Listener on
>> Endpoint: queue://XTokenizerAggregate Selector: Command=2000 OR
>> Command=2002 Broker: tcp://localhost:61616
>> 09:54:38.193 - 1:
>> org.apache.uima.adapter.jms.activemq.SpringContainerDeployer
>> .doStartListeners:
>> INFO: Controller: XTokenizerAggregate Trying to Start Listener on
>> Endpoint: queue://XTokenizerAggregate Selector: Command=2001 Broker:
>> tcp://localhost:61616
>> 09:55:11.411 - 16:
>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl
>> .handleProcessRequestFromRemoteClient:
>> WARNING: Service: XTokenizerAggregate Runtime Exception
>> 09:55:11.411 - 16:
>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl
>> .handleProcessRequestFromRemoteClient:
>> WARNING:
>> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 585;
>> Character reference "&#
>> at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser
>> .parse(AbstractSAXParser.java:1239)
>> at org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(Uim
>> aSerializer.java:187)
>> at org.apache.uima.aae.handler.input.ProcessRequestHandler_impl
>> .deserializeCASandRegisterWithCache(ProcessRequestHandler_impl.java:220)
>> at org.apache.uima.aae.handler.input.ProcessRequestHandler_impl
>> .handleProcessRequestFromRemoteClient(ProcessRequestHandler_
>> impl.java:531)
>> at org.apache.uima.aae.handler.input.ProcessRequestHandler_impl
>> .handle(ProcessRequestHandler_impl.java:1062)
>> at org.apache.uima.aae.handler.input.MetadataRequestHandler_imp
>> l.handle(MetadataRequestHandler_impl.java:78)
>> at org.apache.uima.adapter.jms.activemq.JmsInputChannel.onMessa
>> ge(JmsInputChannel.java:731)
>> at org.springframework.jms.listener.AbstractMessageListenerCont
>> ainer.doInvokeListener(AbstractMessageListenerContainer.java:689)
>> at org.springframework.jms.listener.AbstractMessageListenerCont
>> ainer.invokeListener(AbstractMessageListenerContainer.java:649)
>> at org.springframework.jms.listener.AbstractMessageListenerCont
>> ainer.doExecuteListener(AbstractMessageListenerContainer.java:619)
>> at org.springframework.jms.listener.AbstractPollingMessageListe
>> nerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.
>> java:307)
>> at org.springframework.jms.listener.AbstractPollingMessageListe
>> nerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.
>> java:245)
>> at org.springframework.jms.listener.DefaultMessageListenerConta
>> iner$AsyncMessageListenerInvoker.invokeListener(DefaultMessageLis
>> tenerContainer.java:1144)
>> at org.springframework.jms.listener.DefaultMessageListenerConta
>> iner$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessag
>> eListenerContainer.java:1136)
>> at org.springframework.jms.listener.DefaultMessageListenerConta
>> iner$AsyncMessageListenerInvoker.run(DefaultMessageListenerContai
>> ner.java:1033)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> at org.apache.uima.aae.UimaAsThreadFactory$1.run(UimaAsThreadFa
>> ctory.java:132)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> 09:55:11.412 - 16:
>> org.apache.uima.aae.error.handler.ProcessCasErrorHandler.handleError:
>> WARNING: Service: XTokenizerAggregate Runtime Exception
>> 09:55:11.412 - 16:
>> org.apache.uima.aae.error.handler.ProcessCasErrorHandler.handleError:
>> WARNING:
>> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 585;
>> Character reference "&#
>> at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser
>> .parse(AbstractSAXParser.java:1239)
>> at org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(Uim
>> aSerializer.java:187)
>> at org.apache.uima.aae.handler.input.ProcessRequestHandler_impl
>> .deserializeCASandRegisterWithCache(ProcessRequestHandler_impl.java:220)
>> at org.apache.uima.aae.handler.input.ProcessRequestHandler_impl
>> .handleProcessRequestFromRemoteClient(ProcessRequestHandler_
>