RE: FilteredIterator is very slow
Thomas, Thanks for the suggestions. Before I read your email I happened on this method, FSUtil.getAnnotationsIteratorInSpan(aJCas, annotationType, begin, end). I pass it the typeId of the type I want to check and the begin and end of the dictTerm's span. I need to run some tests to make sure it is doing what I assume it is, but if it is correct, it has much better performance than my previous code. Perhaps this is using the method you suggest. Yes, my dictTerms are much more numerous than the filter types but currently iterate over them one at a time in the caller, ask if any of the filter types overlaps, and if so discard it. If it's not discarded I go on to do other things with it. I could probably preprocess the dictTerms first by checking them for overlap against the filter types and remove the matches. Why do you think this would be faster? Isn't it MxN versus NxM? Thanks again, Larry -Original Message- From: Thomas Ginter [mailto:thomas.gin...@utah.edu] Sent: Monday, March 31, 2014 12:56 PM To: user@uima.apache.org Subject: Re: FilteredIterator is very slow Larry, A faster way to get the list of types that you will skip would be to do the following: FSIndex titlePersonHAIndex = aJCas.getAnnotationIndex(TitlePersonHonorificAnnotation.type); Doing this for each type will yield an index that points to just the annotations in the CAS of each type you are interested in. From there you can get an iterator reference ( titlePersonHAIndex.iterator() ) and either traverse each one separately or else add them to a common Collection such as an ArrayList and iterate through that. You could also take advantage of the fact that the default index in UIMA sorts on ascending order on the begin index and descending order on the ending index to stop once you have traversed the list past the ending index of the dictTerm. An important design decision though would be to consider whether the dictTerm annotations are much more numerous than the TitlePersonHonorificAnnotation, MeasurementAnnotation, and ProgFactorTerm filtering annotation types. Generally if the filter types are much more plentiful and the dictTerm type was more rare then looking for overlapping filter types will yield fewer iterations of your algorithm, however if there are a lot of dictTerm occurrences and only a few of the filter types then it may be more efficient to iterate through the filter types and eliminate dictTerms that overlap or are covered. Thanks, Thomas Ginter 801-448-7676 thomas.gin...@utah.edu On Mar 31, 2014, at 11:47 AM, Kline, Larry wrote: > When I use a filtered FSIterator it's an order of magnitude slower than a > non-filtered iterator. Here's my code: > > Create the iterator: > private FSIterator createConstrainedIterator(JCas aJCas) > throws CASException { > FSIterator it = > aJCas.getAnnotationIndex().iterator(); > FSTypeConstraint constraint = > aJCas.getConstraintFactory().createTypeConstraint(); > constraint.add((new > TitlePersonHonorificAnnotation(aJCas)).getType()); > constraint.add((new MeasurementAnnotation(aJCas)).getType()); > constraint.add((new ProgFactorTerm(aJCas)).getType()); > it = aJCas.createFilteredIterator(it, constraint); > return it; > } > Use the iterator: > public void process(JCas aJCas) throws AnalysisEngineProcessException { > ... > // The following is done in a loop > if (shouldSkip(dictTerm, skipIter)) > continue; > ... > } > Here's the method called: > private boolean shouldSkip(G2DictTerm dictTerm, FSIterator > skipIter) throws CASException { > boolean shouldSkip = false; > skipIter.moveToFirst(); > while (skipIter.hasNext()) { > Annotation annotation = skipIter.next(); > if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) { > shouldSkip = true; > break; > } > } > return shouldSkip; > } > > If I change the method, createConstrainedIterator(), to this (that is, no > constraints): > private FSIterator createConstrainedIterator(JCas aJCas) > throws CASException { > FSIterator it = > aJCas.getAnnotationIndex().iterator(); > return it; > } > > It runs literally 10 times faster. Doing some profiling I see that all of > the time is spent in the skipIter.moveToFirst() call. I also tried creating > the filtered iterator each time anew in the shouldSkip() method instead of > passing it in, but that has even slightly worse performance. > > Given this performance I suppose I should probably use a non-filtered > iterator and just check for the types I'm interested in inside the loop. > > Any other suggest
Re: What if head node fails in DUCC
Each job has a job driver. All job drivers and job processes not running on the head node continue working as long as the DUCC broker is still viable, and an AMQ broker can easily be configured in master/slave configuration. Similarly, DUCC service processes and reservation processes not running on the head node are unaffected. Eddie On Tue, Apr 1, 2014 at 8:26 AM, reshu.agarwal wrote: > On 04/01/2014 05:28 PM, Eddie Epstein wrote: > >> Correct. Most DUCC daemons running on the head node are restartable. We >> expect to complete this work so that in the case of head node failure a >> new >> head node can automatically be started. >> >> Currently DUCC can be configured such that no active user work is affected >> if a head node goes down. However, without the head node no new user >> processes are created. >> >> Eddie >> >> >> On Tue, Apr 1, 2014 at 1:42 AM, reshu.agarwal >> wrote: >> >> Hi, >>> >>> I have a question. If head node fails then we are no more able to do UIMA >>> processing. Can I defined multiple head nodes in DUCC? If one head node >>> is >>> failed then second node will work as a head node. Is this possible? what >>> is >>> the backup strategy of DUCC? >>> >>> -- >>> Thanks, >>> Reshu Agarwal >>> >>> >>> Hi Eddie, > > Sorry I did not get you. What do you mean by this, "Currently DUCC can be > configured such that no active user work is affected if a head node goes > down. "? > > As I understand if a node goes down or crashed then all the processes > which are running on this node will terminate. So, big question is, "How > DUCC ensures no active user work is affected if head node goes down?" > > -- > Thanks, > Reshu Agarwal > >
Re: What if head node fails in DUCC
On 04/01/2014 05:28 PM, Eddie Epstein wrote: Correct. Most DUCC daemons running on the head node are restartable. We expect to complete this work so that in the case of head node failure a new head node can automatically be started. Currently DUCC can be configured such that no active user work is affected if a head node goes down. However, without the head node no new user processes are created. Eddie On Tue, Apr 1, 2014 at 1:42 AM, reshu.agarwal wrote: Hi, I have a question. If head node fails then we are no more able to do UIMA processing. Can I defined multiple head nodes in DUCC? If one head node is failed then second node will work as a head node. Is this possible? what is the backup strategy of DUCC? -- Thanks, Reshu Agarwal Hi Eddie, Sorry I did not get you. What do you mean by this, "Currently DUCC can be configured such that no active user work is affected if a head node goes down. "? As I understand if a node goes down or crashed then all the processes which are running on this node will terminate. So, big question is, "How DUCC ensures no active user work is affected if head node goes down?" -- Thanks, Reshu Agarwal
Re: Threadsafe advice for using SharedResourceObject
I assume you are talking about runtime, not initialization time. At runtime, it is convenient if the shared object has no mutable state. If it has, you might want to wrap that in as ThreadLocal variable. If there is a problem during initialization, that would be a bug I suppose. -- Richard On 01.04.2014, at 14:05, Marshall Schor wrote: > On 4/1/2014 7:05 AM, Swril wrote: >> I have an AE that I am running in a CPE. >> The AE has a ExternalResource to a SharedResourceObject. >> I set the setMaxProcessingUnitThreadCount to 5. >> >> When I run the pipeline, I am getting errors like NPE which hinted that >> things are out-of-step due to the multiple threads. >> >> Are there any advice on what to do for using AE with ExternalResource in a >> CPE? > > One thought: it's probably necessary to write the code you supply for the > SharedResourceObject to be thread safe. A good source for understanding the > complex details around how to do this is > http://lmgtfy.com/?q=java+concurrency+in+practice . > -Marshall
Re: Threadsafe advice for using SharedResourceObject
On 4/1/2014 7:05 AM, Swril wrote: > I have an AE that I am running in a CPE. > The AE has a ExternalResource to a SharedResourceObject. > I set the setMaxProcessingUnitThreadCount to 5. > > When I run the pipeline, I am getting errors like NPE which hinted that > things are out-of-step due to the multiple threads. > > Are there any advice on what to do for using AE with ExternalResource in a > CPE? One thought: it's probably necessary to write the code you supply for the SharedResourceObject to be thread safe. A good source for understanding the complex details around how to do this is http://lmgtfy.com/?q=java+concurrency+in+practice . -Marshall > > >
Re: What if head node fails in DUCC
Correct. Most DUCC daemons running on the head node are restartable. We expect to complete this work so that in the case of head node failure a new head node can automatically be started. Currently DUCC can be configured such that no active user work is affected if a head node goes down. However, without the head node no new user processes are created. Eddie On Tue, Apr 1, 2014 at 1:42 AM, reshu.agarwal wrote: > > Hi, > > I have a question. If head node fails then we are no more able to do UIMA > processing. Can I defined multiple head nodes in DUCC? If one head node is > failed then second node will work as a head node. Is this possible? what is > the backup strategy of DUCC? > > -- > Thanks, > Reshu Agarwal > >
Re: problem in calling DUCC Service with ducc_submit
Declaring a service dependency does not affect application code paths. The job still needs to connect to the service in the normal way. DUCC uses services dependency for several reasons: to automatically start services when needed by a job; to not give resources to a job or service for which a dependent service is not running; and to post a warning on running jobs when a dependent service goes "bad". Eddie On Tue, Apr 1, 2014 at 1:27 AM, reshu.agarwal wrote: > > Hi, > > I am again in a problem. I have successfully deployed DUCC UIMA AS Service > using ducc_service. The service status is available with good health. If I > try to use my this service using parameter service_dependency to my Job in > ducc_submit then it is not showing any error but executes only the DB > Collection Reader not this service. > > -- > Thanks, > Reshu Agarwal > >
Threadsafe advice for using SharedResourceObject
I have an AE that I am running in a CPE. The AE has a ExternalResource to a SharedResourceObject. I set the setMaxProcessingUnitThreadCount to 5. When I run the pipeline, I am getting errors like NPE which hinted that things are out-of-step due to the multiple threads. Are there any advice on what to do for using AE with ExternalResource in a CPE?