Re: Issue while applying COUNT condition in UIMA RUTA

2018-07-06 Thread Peter Klügl
Hi,


thanks you for reporting this. This is a severe bug and I will fix it as
soon as possible.


Best,


Peter


Am 05.07.2018 um 08:00 schrieb amyjackson...@gmail.com:
> I used COUNT Condition to find the number of punctuations in an 
> annotation.But I didn't received the expected output.
>
>  DECLARE Sentence(INT pmcount);
>  Conflicts of interest"->Sentence;
>
>  DECLARE SentenceLastToken;
>  
> Sentence{-PARTOF(SentenceLastToken)->MARKLAST(SentenceLastToken)};
>  INT Pmcount=0; 
>  
>  Sentence->{ANY+?{->SHIFT(Sentence,1,1,true)} 
> SentenceLastToken{PARTOF(PM)};};
>  Sentence{COUNT(PM,Pmcount)->Sentence.pmcount=Pmcount};
>
> **Sample Input:**
>
>  Conflicts of interest.
>
> **Expected Output:**
>
>   Conflicts of interest
>pmcount:0
>  
> 
> **Received Output:**
>
>   Conflicts of interest
>pmcount:1
>  
> I'm facing this problem only if there is any PM after the Annotation value.
> 
>   

-- 
Peter Klügl
R Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Problem in running DUCC Job for Arabic Language

2018-07-06 Thread rohit14csu173
Yes if i run the AE as a DUCC UIMA-AS Service and send it CASes from UIMA-AS 
client it works fine.
Infact the enviornment i.e the LANG argument is same for UIMA-AS Service and 
DUCC JOB.

Environ[3] = LANG=en_IN

And if i change the LANG=ar then while getting the data coming in JD the arabic 
text is already replaced with ???(Question Mark) and the encoding of the data 
coming in JD or CR  shows ASCII encoding.
I don't understand why is this happening.

Best
Rohit 


On 2018/07/05 13:35:11, Eddie Epstein  wrote: 
> So if you run the AE as a DUCC UIMA-AS service and send it CASes from some
> UIMA-AS client it works OK? The full environment for all processes that
> DUCC launches are available via ducc-mon under the Specification or
> Registry tab for that job or managed reservation or service. Please see if
> the LANG setting for the service is different from the LANG setting for the
> job.
> 
> One can also see the LANG setting for a linux process-id by doing:
> 
> cat /proc//environ
> 
> The LANG to be used for a DUCC process can be set by adding to the
> --environment argument "LANG=xxx" as needed
> 
> Thanks,
> Eddie
> 
> 
> 
> On Thu, Jul 5, 2018 at 6:47 AM, rohit14csu...@ncuindia.edu <
> rohit14csu...@ncuindia.edu> wrote:
> 
> > Hey,
> >  Yeah you got it right the first snippet comes in CR before the data goes
> > in CAS.
> > And the second snippet is in the first annotator or analysis engine(AE) of
> > my Aggregate Desciptor.
> > I am pretty sure this is an issue of the CAS used by DUCC because if i use
> > service of DUCC in which we are supposed to send the CAS and receive the
> > same CAS with added features from DUCC i get accurate results.
> >
> > But the only problem comes in submitting a job where the cas is generated
> > by DUCC.
> > This can also be a issue of the enviornment(Language) of DUCC because the
> > default language is english.
> >
> > Bets Regards
> > Rohit
> >
> > On 2018/07/03 13:11:50, Eddie Epstein  wrote:
> > > Rohit,
> > >
> > > Before sending the data into jcas if i force encode it :-
> > > >
> > > > String content2 = null;
> > > > content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1");
> > > > jcas.setDocumentText(content2);
> > > >
> > >
> > > Where is this code, in the job CR?
> > >
> > >
> > >
> > > >
> > > > And when i go in my first annotator i force decode it:-
> > > >
> > > > String content = null;
> > > > content = new String(jcas.getDocumentText.getBytes("ISO-8859-1"),
> > > > "UTF-8");
> > > >
> > >
> > > And is this in the first annotator of the job process, i.e. the CM?
> > >
> > > Please be as specific as possible.
> > >
> > > Thanks,
> > > Eddie
> > >
> >
>