RE: FilteredIterator is very slow

2014-04-01 Thread Kline, Larry
Thomas,

Thanks for the suggestions.  Before I read your email I happened on this 
method, FSUtil.getAnnotationsIteratorInSpan(aJCas, annotationType, begin, end). 
 I pass it the typeId of the type I want to check and the begin and end of the 
dictTerm's span.  I need to run some tests to make sure it is doing what I 
assume it is, but if it is correct, it has much better performance than my 
previous code.  Perhaps this is using the method you suggest.  Yes, my 
dictTerms are much more numerous than the filter types but currently iterate 
over them one at a time in the caller, ask if any of the filter types overlaps, 
and if so discard it.  If it's not discarded I go on to do other things with 
it.  I could probably preprocess the dictTerms first by checking them for 
overlap against the filter types and remove the matches.  Why do you think this 
would be faster?  Isn't it MxN versus NxM?

Thanks again,
Larry

-Original Message-
From: Thomas Ginter [mailto:thomas.gin...@utah.edu] 
Sent: Monday, March 31, 2014 12:56 PM
To: user@uima.apache.org
Subject: Re: FilteredIterator is very slow

Larry,

A faster way to get the list of types that you will skip would be to do the 
following:

FSIndex titlePersonHAIndex = 
aJCas.getAnnotationIndex(TitlePersonHonorificAnnotation.type);

Doing this for each type will yield an index that points to just the 
annotations in the CAS of each type you are interested in.  From there you can 
get an iterator reference ( titlePersonHAIndex.iterator() ) and either traverse 
each one separately or else add them to a common Collection such as an 
ArrayList and iterate through that.  You could also take advantage of the fact 
that the default index in UIMA sorts on ascending order on the begin index and 
descending order on the ending index to stop once you have traversed the list 
past the ending index of the dictTerm.  

An important design decision though would be to consider whether the dictTerm 
annotations are much more numerous than the TitlePersonHonorificAnnotation, 
MeasurementAnnotation, and ProgFactorTerm filtering annotation types.  
Generally if the filter types are much more plentiful and the dictTerm type was 
more rare then looking for overlapping filter types will yield fewer iterations 
of your algorithm, however if there are a lot of dictTerm occurrences and only 
a few of the filter types then it may be more efficient to iterate through the 
filter types and eliminate dictTerms that overlap or are covered.  

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Mar 31, 2014, at 11:47 AM, Kline, Larry  wrote:

> When I use a filtered FSIterator it's an order of magnitude slower than a 
> non-filtered iterator.  Here's my code:
> 
> Create the iterator:
>   private FSIterator createConstrainedIterator(JCas aJCas) 
> throws CASException {
>  FSIterator it = 
> aJCas.getAnnotationIndex().iterator();
>  FSTypeConstraint constraint = 
> aJCas.getConstraintFactory().createTypeConstraint();
>  constraint.add((new 
> TitlePersonHonorificAnnotation(aJCas)).getType());
>  constraint.add((new MeasurementAnnotation(aJCas)).getType());
>  constraint.add((new ProgFactorTerm(aJCas)).getType());
>  it = aJCas.createFilteredIterator(it, constraint);
>  return it;
>   }
> Use the iterator:
>   public void process(JCas aJCas) throws AnalysisEngineProcessException {
>  ...
> // The following is done in a loop
>   if (shouldSkip(dictTerm, skipIter))
>  continue;
>  ...
>   }
> Here's the method called:
>   private boolean shouldSkip(G2DictTerm dictTerm, FSIterator 
> skipIter) throws CASException {
>  boolean shouldSkip = false;
>  skipIter.moveToFirst();
>  while (skipIter.hasNext()) {
> Annotation annotation = skipIter.next();
> if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) {
>   shouldSkip = true;
>   break;
> }
>  }
>  return shouldSkip;
>   }
> 
> If I change the method, createConstrainedIterator(), to this (that is, no 
> constraints):
>   private FSIterator createConstrainedIterator(JCas aJCas) 
> throws CASException {
>  FSIterator it = 
> aJCas.getAnnotationIndex().iterator();
>  return it;
>   }
> 
> It runs literally 10 times faster.  Doing some profiling I see that all of 
> the time is spent in the skipIter.moveToFirst() call.  I also tried creating 
> the filtered iterator each time anew in the shouldSkip() method instead of 
> passing it in, but that has even slightly worse performance.
> 
> Given this performance I suppose I should probably use a non-filtered 
> iterator and just check for the types I'm interested in inside the loop.
> 
> Any other suggest

Re: What if head node fails in DUCC

2014-04-01 Thread Eddie Epstein
Each job has a job driver. All job drivers and job processes not running on
the head node continue working as long as the DUCC broker is still viable,
and an AMQ broker can easily be configured in master/slave configuration.

Similarly, DUCC service processes and reservation processes not running on
the head node are unaffected.

Eddie


On Tue, Apr 1, 2014 at 8:26 AM, reshu.agarwal wrote:

> On 04/01/2014 05:28 PM, Eddie Epstein wrote:
>
>> Correct. Most DUCC daemons running on the head node are restartable. We
>> expect to complete this work so that in the case of head node failure a
>> new
>> head node can automatically be started.
>>
>> Currently DUCC can be configured such that no active user work is affected
>> if a head node goes down. However, without the head node no new user
>> processes are created.
>>
>> Eddie
>>
>>
>> On Tue, Apr 1, 2014 at 1:42 AM, reshu.agarwal 
>> wrote:
>>
>>  Hi,
>>>
>>> I have a question. If head node fails then we are no more able to do UIMA
>>> processing. Can I defined multiple head nodes in DUCC? If one head node
>>> is
>>> failed then second node will work as a head node. Is this possible? what
>>> is
>>> the backup strategy of DUCC?
>>>
>>> --
>>> Thanks,
>>> Reshu Agarwal
>>>
>>>
>>>  Hi Eddie,
>
> Sorry I did not get you. What do you mean by this, "Currently DUCC can be
> configured such that no active user work is affected if a head node goes
> down. "?
>
> As I understand if a node goes down or crashed then all the processes
> which are running on this node will terminate. So, big question is, "How
> DUCC ensures no active user work is affected if head node goes down?"
>
> --
> Thanks,
> Reshu Agarwal
>
>


Re: What if head node fails in DUCC

2014-04-01 Thread reshu.agarwal

On 04/01/2014 05:28 PM, Eddie Epstein wrote:

Correct. Most DUCC daemons running on the head node are restartable. We
expect to complete this work so that in the case of head node failure a new
head node can automatically be started.

Currently DUCC can be configured such that no active user work is affected
if a head node goes down. However, without the head node no new user
processes are created.

Eddie


On Tue, Apr 1, 2014 at 1:42 AM, reshu.agarwal wrote:


Hi,

I have a question. If head node fails then we are no more able to do UIMA
processing. Can I defined multiple head nodes in DUCC? If one head node is
failed then second node will work as a head node. Is this possible? what is
the backup strategy of DUCC?

--
Thanks,
Reshu Agarwal



Hi Eddie,

Sorry I did not get you. What do you mean by this, "Currently DUCC can 
be configured such that no active user work is affected if a head node 
goes down. "?


As I understand if a node goes down or crashed then all the processes 
which are running on this node will terminate. So, big question is, "How 
DUCC ensures no active user work is affected if head node goes down?"


--
Thanks,
Reshu Agarwal



Re: Threadsafe advice for using SharedResourceObject

2014-04-01 Thread Richard Eckart de Castilho
I assume you are talking about runtime, not initialization time.

At runtime, it is convenient if the shared object has no mutable
state. If it has, you might want to wrap that in as ThreadLocal
variable.

If there is a problem during initialization, that would be a bug
I suppose.

-- Richard

On 01.04.2014, at 14:05, Marshall Schor  wrote:

> On 4/1/2014 7:05 AM, Swril wrote:
>> I have an AE that I am running in a CPE. 
>> The AE has a ExternalResource to a SharedResourceObject.
>> I set the setMaxProcessingUnitThreadCount to 5.
>> 
>> When I run the pipeline, I am getting errors like NPE which hinted that 
>> things are out-of-step due to the multiple threads.
>> 
>> Are there any advice on what to do for using AE with ExternalResource in a 
>> CPE?
> 
> One thought: it's probably necessary to write the code you supply for the
> SharedResourceObject to be thread safe.  A good source for understanding the
> complex details around how to do this is
> http://lmgtfy.com/?q=java+concurrency+in+practice  .
> -Marshall


Re: Threadsafe advice for using SharedResourceObject

2014-04-01 Thread Marshall Schor

On 4/1/2014 7:05 AM, Swril wrote:
> I have an AE that I am running in a CPE. 
> The AE has a ExternalResource to a SharedResourceObject.
> I set the setMaxProcessingUnitThreadCount to 5.
>
> When I run the pipeline, I am getting errors like NPE which hinted that 
> things are out-of-step due to the multiple threads.
>
> Are there any advice on what to do for using AE with ExternalResource in a 
> CPE?

One thought: it's probably necessary to write the code you supply for the
SharedResourceObject to be thread safe.  A good source for understanding the
complex details around how to do this is
http://lmgtfy.com/?q=java+concurrency+in+practice  .
-Marshall
>
>
>



Re: What if head node fails in DUCC

2014-04-01 Thread Eddie Epstein
Correct. Most DUCC daemons running on the head node are restartable. We
expect to complete this work so that in the case of head node failure a new
head node can automatically be started.

Currently DUCC can be configured such that no active user work is affected
if a head node goes down. However, without the head node no new user
processes are created.

Eddie


On Tue, Apr 1, 2014 at 1:42 AM, reshu.agarwal wrote:

>
> Hi,
>
> I have a question. If head node fails then we are no more able to do UIMA
> processing. Can I defined multiple head nodes in DUCC? If one head node is
> failed then second node will work as a head node. Is this possible? what is
> the backup strategy of DUCC?
>
> --
> Thanks,
> Reshu Agarwal
>
>


Re: problem in calling DUCC Service with ducc_submit

2014-04-01 Thread Eddie Epstein
Declaring a service dependency does not affect application code paths. The
job still needs to connect to the service in the normal way.

DUCC uses services dependency for several reasons: to automatically start
services when needed by a job; to not give resources to a job or service
for which a dependent service is not running; and to post a warning on
running jobs when a dependent service goes "bad".

Eddie


On Tue, Apr 1, 2014 at 1:27 AM, reshu.agarwal wrote:

>
> Hi,
>
> I am again in a problem. I have successfully deployed DUCC UIMA AS Service
> using ducc_service. The service status is available with good health. If I
> try to use my this service using parameter service_dependency to my Job in
> ducc_submit then it is not showing any error but executes only the DB
> Collection Reader not this service.
>
> --
> Thanks,
> Reshu Agarwal
>
>


Threadsafe advice for using SharedResourceObject

2014-04-01 Thread Swril
I have an AE that I am running in a CPE. 
The AE has a ExternalResource to a SharedResourceObject.
I set the setMaxProcessingUnitThreadCount to 5.

When I run the pipeline, I am getting errors like NPE which hinted that 
things are out-of-step due to the multiple threads.

Are there any advice on what to do for using AE with ExternalResource in a 
CPE?