[jira] Created: (UIMA-1107) Sofa mapping not applied when annotator loaded from PEAR
Sofa mapping not applied when annotator loaded from PEAR Key: UIMA-1107 URL: https://issues.apache.org/jira/browse/UIMA-1107 Project: UIMA Issue Type: Bug Components: Core Java Framework Affects Versions: 2.2.2 Reporter: Aaron Kaplan I have an aggregate annotator consisting of an annotator A1 that creates a new sofa, and an annotator A2 that annotates the new sofa. A2 is not sofa-aware, so in the aggregate descriptor I have defined a sofa mapping. In the delegateAnalysisEngine element of the aggregate descriptor, if I point to A2's component descriptor (A2/desc/A2.xml), the sofa mapping works: A2 processes the new sofa created by A1. If I point instead to A2's pear installation descriptor (A2/A2_pear.xml), the sofa mapping seems not to be applied: A2 processes the initial sofa instead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (UIMA-1108) correct character offset for OpenCalais annotator
correct character offset for OpenCalais annotator - Key: UIMA-1108 URL: https://issues.apache.org/jira/browse/UIMA-1108 Project: UIMA Issue Type: Bug Components: Sandbox-CalaisAnnotator Reporter: Michael Baessler Assignee: Michael Baessler the calais service do some text cleaning that manipulates the character offsets, this must to be corrected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (UIMA-1108) correct character offset for OpenCalais annotator
[ https://issues.apache.org/jira/browse/UIMA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Baessler closed UIMA-1108. -- Resolution: Fixed correct character offset for OpenCalais annotator - Key: UIMA-1108 URL: https://issues.apache.org/jira/browse/UIMA-1108 Project: UIMA Issue Type: Bug Components: Sandbox-CalaisAnnotator Reporter: Michael Baessler Assignee: Michael Baessler the calais service do some text cleaning that manipulates the character offsets, this must to be corrected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Delta CAS
Eddie Epstein wrote: On Wed, Jul 9, 2008 at 1:51 AM, Thilo Goetz [EMAIL PROTECTED] wrote: Nothing so easy. The CAS heap is one large int array. We grow it by allocating a new array with the new desired size and copying the old values over to the new one. There are several issues with this method: * Copying the data takes a surprisingly long time. There's a test case in core that does nothing but add new FSs to the CAS, a lot of them. Marshall complained about how long it took to run when I added it (about 20s on my machine). If you profile that test case, you will see that the vast majority of time is spent in copying data from an old heap to a new heap. If the CAS becomes sufficiently large (in the hundreds of MBs), the time it takes to actually add FSs to the CAS is completely dwarfed by the time it takes for the heap to grow. * The heap lives in a single large array, and a new single large array is allocated every time the heap grows. This is a challenge for the jvm as it allocates this array in a contiguous block of memory. So there must be enough contiguous space on the jvm heap, which likely means a full heap compaction before a new large array can be allocated. Sometimes the jvm fails to allocate that contiguous space, even though there are enough free bytes on the jvm jeap. * Saved the best for last. When allocating a new array, the old one hangs around till we have copied the data. So we're using twice the necessary space for some period of time. That space is often not available. So any time I see an out-of-memory error for large documents (and it's not a bug in the annotator chain), it happens when the CAS heap grows; not because there isn't enough room for the larger heap, but because the old one is still there as well. The CAS can only grow to about half the size we have memory for because of that issue. The situation is more complicated than portrayed. The heap does not have to shrink, so the growth penalty is rare and can be eliminated entirely if the max necessary size heap is specified at startup. FS allocated in the heap do You don't want to allocate a max heap size of 500M just because you may need one that big. You don't even want to allocate 10M ahead of time because if you have many small documents, you can do more parallel processing. So no, I can't specify a large enough heap at start-up and yes, the heap most certainly has to shrink on CAS reset. not have any Java object memory overhead. Garbage collection for separate FS objects would be [much?] worse than the time it takes currently to clear the used part of a CAS heap. I won't believe this until I see it, but I wasn't suggesting this so I'm not going to argue the point, either. Going forward, one approach to this problem could be not one heap array, but a list of arrays. Every time we grow the heap, we would just add another array. That approach solves all the problems mentioned above while being minimally invasive to the way the CAS currently works. However, it raises a new issue: how do you address cells across several arrays in an efficient manner? We don't want to improve performance for large docs at the expense of small ones. So heap addresses might stop being the linear sequence of integers they are today. Maybe we'll use the high bits to address the array, and the low bits to address cells in a given array. And there goes the watermark. Maybe this won't be necessary, I don't know at this point. Each FS object could include an ID that would allow maintaining a high water mark, of course at the expense of another 4 bytes per. With a heap constructed from multiple discontiguous arrays, each array could include a relative ID. This is not to say that the high water mark is always the right approach :) I'm trying to decrease the memory overhead, not increase it. Excellent suggestion, except why not have this discussion now? We just need to put our heads together and figure out how to address this requirement to everybody's satisfaction, case closed. I'm not disagreeing with the requirement, just the proposed implementation thereof. Doing this now may save us (ok, me) a lot of trouble later. Who is against having the discussion now :) Marshall seemed to favor a discussion at a later point. Maybe I misinterpreted. Eddie
Re: Delta CAS
Thilo Goetz wrote: Eddie Epstein wrote: On Wed, Jul 9, 2008 at 1:51 AM, Thilo Goetz [EMAIL PROTECTED] wrote: Nothing so easy. The CAS heap is one large int array. We grow it by allocating a new array with the new desired size and copying the old values over to the new one. There are several issues with this method: * Copying the data takes a surprisingly long time. There's a test case in core that does nothing but add new FSs to the CAS, a lot of them. Marshall complained about how long it took to run when I added it (about 20s on my machine). If you profile that test case, you will see that the vast majority of time is spent in copying data from an old heap to a new heap. If the CAS becomes sufficiently large (in the hundreds of MBs), the time it takes to actually add FSs to the CAS is completely dwarfed by the time it takes for the heap to grow. * The heap lives in a single large array, and a new single large array is allocated every time the heap grows. This is a challenge for the jvm as it allocates this array in a contiguous block of memory. So there must be enough contiguous space on the jvm heap, which likely means a full heap compaction before a new large array can be allocated. Sometimes the jvm fails to allocate that contiguous space, even though there are enough free bytes on the jvm jeap. * Saved the best for last. When allocating a new array, the old one hangs around till we have copied the data. So we're using twice the necessary space for some period of time. That space is often not available. So any time I see an out-of-memory error for large documents (and it's not a bug in the annotator chain), it happens when the CAS heap grows; not because there isn't enough room for the larger heap, but because the old one is still there as well. The CAS can only grow to about half the size we have memory for because of that issue. The situation is more complicated than portrayed. The heap does not have to shrink, so the growth penalty is rare and can be eliminated entirely if the max necessary size heap is specified at startup. FS allocated in the heap do You don't want to allocate a max heap size of 500M just because you may need one that big. You don't even want to allocate 10M ahead of time because if you have many small documents, you can do more parallel processing. So no, I can't specify a large enough heap at start-up and yes, the heap most certainly has to shrink on CAS reset. Some intermediate approach might help here - such as an application or annotator being able to provide performance tuning hints to the framework. For instance, a tokenizer might be able to guesstimate the number of tokens, based on some average token size estimate divided into the size of the document, and provide that as a hint. not have any Java object memory overhead. Garbage collection for separate FS objects would be [much?] worse than the time it takes currently to clear the used part of a CAS heap. I won't believe this until I see it, but I wasn't suggesting this so I'm not going to argue the point, either. Going forward, one approach to this problem could be not one heap array, but a list of arrays. Every time we grow the heap, we would just add another array. That approach solves all the problems mentioned above while being minimally invasive to the way the CAS currently works. However, it raises a new issue: how do you address cells across several arrays in an efficient manner? We don't want to improve performance for large docs at the expense of small ones. So heap addresses might stop being the linear sequence of integers they are today. Maybe we'll use the high bits to address the array, and the low bits to address cells in a given array. And there goes the watermark. Maybe this won't be necessary, I don't know at this point. Each FS object could include an ID that would allow maintaining a high water mark, of course at the expense of another 4 bytes per. With a heap constructed from multiple discontiguous arrays, each array could include a relative ID. This is not to say that the high water mark is always the right approach :) I'm trying to decrease the memory overhead, not increase it. Would there be a solution that would work for the multi-block heap, without adding 4 bytes per FS Object? Excellent suggestion, except why not have this discussion now? We just need to put our heads together and figure out how to address this requirement to everybody's satisfaction, case closed. I'm not disagreeing with the requirement, just the proposed implementation thereof. Doing this now may save us (ok, me) a lot of trouble later. Who is against having the discussion now :) Marshall seemed to favor a discussion at a later point. Maybe I misinterpreted. I did not intend to express favoring a discussion at a later point versus now. Discussions at any point are good, IMHO. -Marshall Eddie
Re: Delta CAS
On Wed, Jul 9, 2008 at 9:18 AM, Thilo Goetz [EMAIL PROTECTED] wrote: You don't want to allocate a max heap size of 500M just because you may need one that big. You don't even want to allocate 10M ahead of time because if you have many small documents, you can do more parallel processing. So no, I can't specify a large enough heap at start-up and yes, the heap most certainly has to shrink on CAS reset. Sounds like your scenario has multiple threads, each with at least one CAS, processing a mixed size of documents. Either there is enough Java heap space to process multiple large documents at the same time or not. Pre-allocating the CAS heap space and not letting them grow enables soft processing failures of large documents rather than the unfortunate failure of the entire JVM. Can you say more about the scenario(s) we are optimizing for?
Re: Delta CAS
Eddie Epstein wrote: On Wed, Jul 9, 2008 at 9:18 AM, Thilo Goetz [EMAIL PROTECTED] wrote: You don't want to allocate a max heap size of 500M just because you may need one that big. You don't even want to allocate 10M ahead of time because if you have many small documents, you can do more parallel processing. So no, I can't specify a large enough heap at start-up and yes, the heap most certainly has to shrink on CAS reset. Sounds like your scenario has multiple threads, each with at least one CAS, I don't usually have the luxury of running just UIMA on a server. Other processes want memory, too. processing a mixed size of documents. Either there is enough Java heap space to process multiple large documents at the same time or not. Pre-allocating the CAS heap space and not letting them grow enables soft processing failures of large documents rather than the unfortunate failure of the entire JVM. Can you say more about the scenario(s) we are optimizing for? Variously sized documents, some of them very large, many very small.
Re: Delta CAS
Marshall Schor wrote: Some intermediate approach might help here - such as an application or annotator being able to provide performance tuning hints to the framework. For instance, a tokenizer might be able to guesstimate the number of tokens, based on some average token size estimate divided into the size of the document, and provide that as a hint. Tell me about it. We've built a whole framework to try and figure out ahead of time how much memory processing a certain document is going to take, so we know how many threads we can run in parallel before crashing the JVM. This turns out to be quite difficult if you don't know what kinds of documents you'll be getting, and you work with many different languages. --Thilo
Re: Delta CAS
I think we need another thread to discuss the heap. Back to the high-water mark ... isn't it just the largest xmi id in the serialized CAS? Its relationship to the CAS heap is a matter of implementation but presumably we can have a design that says any new FSs must be given an xmi id above the high-water mark when serialized back from a service. We already have the requirement that ids must be preserved for the merging of parallel replies. Burn.
[jira] Updated: (UIMA-1096) Incorrect metaData returned when deployed as a separate process
[ https://issues.apache.org/jira/browse/UIMA-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Iyer updated UIMA-1096: --- Attachment: UIMA-1096.patch Fixed the inconsistencies described. One clarification - the default value for mutiple references allowed is false. Incorrect metaData returned when deployed as a separate process --- Key: UIMA-1096 URL: https://issues.apache.org/jira/browse/UIMA-1096 Project: UIMA Issue Type: Bug Components: C++ Framework Reporter: Burn Lewis Attachments: UIMA-1096.patch Comparing the getMeta reply from a service deployed as a separate C++ process with that from one deployed via JNI I see the following: 1) A typePriority index key is changed: fsIndexKey typePriority/ /fsIndexKey becomes: fsIndexKey featureName /featureName comparatorstandard/comparator /fsIndexKey 2) Invalid xml chars are not escaped, e.g. description NAMED gt; NOMINAL gt; PRONOMINAL./description becomes description NAMED NOMINAL PRONOMINAL./description 3) The default of multipleReferencesAllowedtrue/multipleReferencesAllowed is inserted in many featureDescriptions 4) These may be bugs in the JNI output: both typePriorities and operationalProperties are only in the JNI reply. 5) First 2 lines are missing the encoding xmlns attributes: ?xml version=1.0 encoding=UTF-8? analysisEngineMetaData xmlns=http://uima.apache.org/resourceSpecifier; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (UIMA-1096) Incorrect metaData returned when deployed as a separate process
[ https://issues.apache.org/jira/browse/UIMA-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Iyer reassigned UIMA-1096: -- Assignee: Bhavani Iyer Incorrect metaData returned when deployed as a separate process --- Key: UIMA-1096 URL: https://issues.apache.org/jira/browse/UIMA-1096 Project: UIMA Issue Type: Bug Components: C++ Framework Reporter: Burn Lewis Assignee: Bhavani Iyer Attachments: UIMA-1096.patch Comparing the getMeta reply from a service deployed as a separate C++ process with that from one deployed via JNI I see the following: 1) A typePriority index key is changed: fsIndexKey typePriority/ /fsIndexKey becomes: fsIndexKey featureName /featureName comparatorstandard/comparator /fsIndexKey 2) Invalid xml chars are not escaped, e.g. description NAMED gt; NOMINAL gt; PRONOMINAL./description becomes description NAMED NOMINAL PRONOMINAL./description 3) The default of multipleReferencesAllowedtrue/multipleReferencesAllowed is inserted in many featureDescriptions 4) These may be bugs in the JNI output: both typePriorities and operationalProperties are only in the JNI reply. 5) First 2 lines are missing the encoding xmlns attributes: ?xml version=1.0 encoding=UTF-8? analysisEngineMetaData xmlns=http://uima.apache.org/resourceSpecifier; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Delta CAS
Here's a suggestion suggested by previous posts, and common hardware design for segmented memory. Take the int values that represent feature structure (fs) references. Today, these are positive numbers from 1 (I think) to around 4 billion. These values are used directly as an index into the heap. Change this to split the bits in these int values into two parts, let's call them upper and lower. For example where the xxx's are the upper bits (each x represents a hex digit), and the y's the lower bits. The y's in this case can represent numbers up to 1 million (approx), and the xxx's represent 4096 values. Then allocate the heap using multiple 1 meg entry tables, and store each one in the 4096 entry reference array. The heap reference would be some bit-wise shifting and indexed lookup in addition to what we have now and would probably be very fast, and could be optimized for the xxx=0 case to be even faster. This breaks heaps of over 1 meg into separate parts, which would make them more managable, I think, and keeps the high-water mark method viable, too. Opinions? -Marshall
[jira] Updated: (UIMA-1104) Need a monitor component for UIMA-AS services to capture performance metrics
[ https://issues.apache.org/jira/browse/UIMA-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Cwiklik updated UIMA-1104: Attachment: uimaj-as-core-UIMA-1104-patch-04.txt uimaj-as-activemq-UIMA-1104-patch-04.txt Fixed idle time for remotes Need a monitor component for UIMA-AS services to capture performance metrics - Key: UIMA-1104 URL: https://issues.apache.org/jira/browse/UIMA-1104 Project: UIMA Issue Type: New Feature Components: Async Scaleout Reporter: Jerry Cwiklik Attachments: idleWithRemote.txt, uimaj-as-activemq-UIMA-1104-patch-03.txt, uimaj-as-activemq-UIMA-1104-patch-04.txt, uimaj-as-activemq-UIMA-1104-patch.txt, uimaj-as-core-UIMA-1104-patch-02.txt, uimaj-as-core-UIMA-1104-patch-03.txt, uimaj-as-core-UIMA-1104-patch-04.txt, uimaj-as-core-UIMA-1104-patch.txt In complex uima-as deployments it is hard to find bottlenecks which need scaleup. A JMX-based monitor is needed to collect runtime metrics from every uima-as service. The metrics must include idle time, queue depth, amount of time each service waits for a free CAS. The monitor should be an embeddable component that can be deployed in a java application. The monitor should allow custom formatting of metrics via pluggable extension. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (UIMA-1104) Need a monitor component for UIMA-AS services to capture performance metrics
[ https://issues.apache.org/jira/browse/UIMA-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Cwiklik updated UIMA-1104: Attachment: uimaj-as-core-UIMA-1104-patch-05.txt uimaj-as-activemq-UIMA-1104-patch-05.txt Removed debugging output from the code Need a monitor component for UIMA-AS services to capture performance metrics - Key: UIMA-1104 URL: https://issues.apache.org/jira/browse/UIMA-1104 Project: UIMA Issue Type: New Feature Components: Async Scaleout Reporter: Jerry Cwiklik Attachments: idleWithRemote.txt, uimaj-as-activemq-UIMA-1104-patch-03.txt, uimaj-as-activemq-UIMA-1104-patch-04.txt, uimaj-as-activemq-UIMA-1104-patch-05.txt, uimaj-as-activemq-UIMA-1104-patch.txt, uimaj-as-core-UIMA-1104-patch-02.txt, uimaj-as-core-UIMA-1104-patch-03.txt, uimaj-as-core-UIMA-1104-patch-04.txt, uimaj-as-core-UIMA-1104-patch-05.txt, uimaj-as-core-UIMA-1104-patch.txt In complex uima-as deployments it is hard to find bottlenecks which need scaleup. A JMX-based monitor is needed to collect runtime metrics from every uima-as service. The metrics must include idle time, queue depth, amount of time each service waits for a free CAS. The monitor should be an embeddable component that can be deployed in a java application. The monitor should allow custom formatting of metrics via pluggable extension. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Delta CAS
Back to the high-water mark ... isn't it just the largest xmi id in the serialized CAS? Its relationship to the CAS heap is a matter of implementation but presumably we can have a design that says any new FSs must be given an xmi id above the high-water mark when serialized back from a service. We already have the requirement that ids must be preserved for the merging of parallel replies. Yes - there are really two definitions of high-water mark floating around in this thread and it would be good to split them apart. (1) the largest xmi:id in the serialized CAS. This is a requirement that the service protocol places on the CAS serializer. This is what we already have for merging, and I don't think Thilo is objecting to this. (2) a dependency on the FS address being an indicator of which FS are newer than others (an FS with a larger address is newer). As I think about it now I am actually unclear on whether we are doing #2 right now at all. Bhavani said we were, but that's not how I recall that the serializer currently works. It keeps a table of all the incoming FS, which is necessary in order to have the xmi:ids going out be the same as the ones coming in. So I thought the serializer just used the fact that an FS was missing from this table to determine that it was new, and *not* a high water mark of the FS address. Bhavani, can you clarify? -Adam
[jira] Updated: (UIMA-1105) CPE is sutck trying to retrieve a free CAS from the pool
[ https://issues.apache.org/jira/browse/UIMA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Cwiklik updated UIMA-1105: Attachment: ProcessingUnit-patch.txt Fixes the hang in the CPE. The CPE, during its error handling, was trying to fetch a free CAS instance from a cas pool to convert CasData representation to CasObject. This conversion has already been done prior calling process() method and is unnecessary. A call to notifyListeners() contains an incorrect arg value which forces the code to attempt to fetch a new CAS for conversion. Moved the code that sets the right parameter for the notifyListeners() just before the process() call. When an exception happens, the code doesnt attempt to fetch a new CAS from a cas pool. CPE is sutck trying to retrieve a free CAS from the pool Key: UIMA-1105 URL: https://issues.apache.org/jira/browse/UIMA-1105 Project: UIMA Issue Type: Bug Components: Collection Processing Affects Versions: 2.2.1 Environment: Windows XP 32 bits Reporter: Olivier Terrier Attachments: cpe.xml, ProcessingUnit-patch.txt, uima.zip Buggy scenario is a CPE with a first remote processor deployed as a Vinci service and an integrated CAS consumer that throws a ResourceProcessException in its process method. It is quite easy to reproduce with a dummy consumer with this implementation public void processCas(CAS aCAS) throws ResourceProcessException { throw new ResourceProcessException(new FileNotFoundException(file not found)); } It looks like the CPE is stuck trying to retrieve a CAS from the CAS pool that is apparently empty at some point. My feeling is that when you have an ResourceProcessException thrown in the last component of the CPE, the code that is supposed to release the CAS from the CAS pool is not properly called... If I suspend the process in Eclipse I can see that the CasConsumer and the Collection Reader pipelines Threads are waiting on the CPECasPool.getCas(long) method I attach the uima.log set to the FINEST level -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Delta CAS
No opinions, but a few observations: 1M is way too big for some applications that need very small, but very many CASes. Large arrays may be bigger than whatever segment size is chosen, making segment management a bit more complicated. There will be holes at the top of every segment when the next FS doesn't fit. Eddie On Wed, Jul 9, 2008 at 2:37 PM, Marshall Schor [EMAIL PROTECTED] wrote: Here's a suggestion suggested by previous posts, and common hardware design for segmented memory. Take the int values that represent feature structure (fs) references. Today, these are positive numbers from 1 (I think) to around 4 billion. These values are used directly as an index into the heap. Change this to split the bits in these int values into two parts, let's call them upper and lower. For example where the xxx's are the upper bits (each x represents a hex digit), and the y's the lower bits. The y's in this case can represent numbers up to 1 million (approx), and the xxx's represent 4096 values. Then allocate the heap using multiple 1 meg entry tables, and store each one in the 4096 entry reference array. The heap reference would be some bit-wise shifting and indexed lookup in addition to what we have now and would probably be very fast, and could be optimized for the xxx=0 case to be even faster. This breaks heaps of over 1 meg into separate parts, which would make them more managable, I think, and keeps the high-water mark method viable, too. Opinions? -Marshall
[jira] Resolved: (UIMA-1105) CPE is sutck trying to retrieve a free CAS from the pool
[ https://issues.apache.org/jira/browse/UIMA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marshall Schor resolved UIMA-1105. -- Resolution: Fixed Applied the patch. I'll attach the modified Jar to this issue, for testing. - Olivier - can you test it? I think you should just be able to do this by replacing the jar in your classpath (unless there are other changes... this was built on the latest version in the trunk, but I don't think this part of the code has been changing very much). CPE is sutck trying to retrieve a free CAS from the pool Key: UIMA-1105 URL: https://issues.apache.org/jira/browse/UIMA-1105 Project: UIMA Issue Type: Bug Components: Collection Processing Affects Versions: 2.2.1 Environment: Windows XP 32 bits Reporter: Olivier Terrier Attachments: cpe.xml, ProcessingUnit-patch.txt, uima.zip Buggy scenario is a CPE with a first remote processor deployed as a Vinci service and an integrated CAS consumer that throws a ResourceProcessException in its process method. It is quite easy to reproduce with a dummy consumer with this implementation public void processCas(CAS aCAS) throws ResourceProcessException { throw new ResourceProcessException(new FileNotFoundException(file not found)); } It looks like the CPE is stuck trying to retrieve a CAS from the CAS pool that is apparently empty at some point. My feeling is that when you have an ResourceProcessException thrown in the last component of the CPE, the code that is supposed to release the CAS from the CAS pool is not properly called... If I suspend the process in Eclipse I can see that the CasConsumer and the Collection Reader pipelines Threads are waiting on the CPECasPool.getCas(long) method I attach the uima.log set to the FINEST level -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (UIMA-1105) CPE is sutck trying to retrieve a free CAS from the pool
[ https://issues.apache.org/jira/browse/UIMA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marshall Schor reassigned UIMA-1105: Assignee: Marshall Schor CPE is sutck trying to retrieve a free CAS from the pool Key: UIMA-1105 URL: https://issues.apache.org/jira/browse/UIMA-1105 Project: UIMA Issue Type: Bug Components: Collection Processing Affects Versions: 2.2.1 Environment: Windows XP 32 bits Reporter: Olivier Terrier Assignee: Marshall Schor Attachments: cpe.xml, ProcessingUnit-patch.txt, uima.zip Buggy scenario is a CPE with a first remote processor deployed as a Vinci service and an integrated CAS consumer that throws a ResourceProcessException in its process method. It is quite easy to reproduce with a dummy consumer with this implementation public void processCas(CAS aCAS) throws ResourceProcessException { throw new ResourceProcessException(new FileNotFoundException(file not found)); } It looks like the CPE is stuck trying to retrieve a CAS from the CAS pool that is apparently empty at some point. My feeling is that when you have an ResourceProcessException thrown in the last component of the CPE, the code that is supposed to release the CAS from the CAS pool is not properly called... If I suspend the process in Eclipse I can see that the CasConsumer and the Collection Reader pipelines Threads are waiting on the CPECasPool.getCas(long) method I attach the uima.log set to the FINEST level -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (UIMA-1105) CPE is sutck trying to retrieve a free CAS from the pool
[ https://issues.apache.org/jira/browse/UIMA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marshall Schor updated UIMA-1105: - Fix Version/s: 2.3 CPE is sutck trying to retrieve a free CAS from the pool Key: UIMA-1105 URL: https://issues.apache.org/jira/browse/UIMA-1105 Project: UIMA Issue Type: Bug Components: Collection Processing Affects Versions: 2.2.1 Environment: Windows XP 32 bits Reporter: Olivier Terrier Assignee: Marshall Schor Fix For: 2.3 Attachments: cpe.xml, ProcessingUnit-patch.txt, uima.zip Buggy scenario is a CPE with a first remote processor deployed as a Vinci service and an integrated CAS consumer that throws a ResourceProcessException in its process method. It is quite easy to reproduce with a dummy consumer with this implementation public void processCas(CAS aCAS) throws ResourceProcessException { throw new ResourceProcessException(new FileNotFoundException(file not found)); } It looks like the CPE is stuck trying to retrieve a CAS from the CAS pool that is apparently empty at some point. My feeling is that when you have an ResourceProcessException thrown in the last component of the CPE, the code that is supposed to release the CAS from the CAS pool is not properly called... If I suspend the process in Eclipse I can see that the CasConsumer and the Collection Reader pipelines Threads are waiting on the CPECasPool.getCas(long) method I attach the uima.log set to the FINEST level -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (UIMA-1105) CPE is sutck trying to retrieve a free CAS from the pool
[ https://issues.apache.org/jira/browse/UIMA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marshall Schor updated UIMA-1105: - Attachment: uima-cpe.jar build for testing CPE is sutck trying to retrieve a free CAS from the pool Key: UIMA-1105 URL: https://issues.apache.org/jira/browse/UIMA-1105 Project: UIMA Issue Type: Bug Components: Collection Processing Affects Versions: 2.2.1 Environment: Windows XP 32 bits Reporter: Olivier Terrier Assignee: Marshall Schor Fix For: 2.3 Attachments: cpe.xml, ProcessingUnit-patch.txt, uima-cpe.jar, uima.zip Buggy scenario is a CPE with a first remote processor deployed as a Vinci service and an integrated CAS consumer that throws a ResourceProcessException in its process method. It is quite easy to reproduce with a dummy consumer with this implementation public void processCas(CAS aCAS) throws ResourceProcessException { throw new ResourceProcessException(new FileNotFoundException(file not found)); } It looks like the CPE is stuck trying to retrieve a CAS from the CAS pool that is apparently empty at some point. My feeling is that when you have an ResourceProcessException thrown in the last component of the CPE, the code that is supposed to release the CAS from the CAS pool is not properly called... If I suspend the process in Eclipse I can see that the CasConsumer and the Collection Reader pipelines Threads are waiting on the CPECasPool.getCas(long) method I attach the uima.log set to the FINEST level -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Delta CAS
If we are thinking of Delta CAS in the context of service the largest xmi id works. But we were also using the same mechanism to support tracking CAS activity by component. I suppose in the second case the additional overhead of maintaining a list of the FSs that are added may be acceptable. On Wed, Jul 9, 2008 at 3:48 PM, Adam Lally [EMAIL PROTECTED] wrote: Back to the high-water mark ... isn't it just the largest xmi id in the serialized CAS? Its relationship to the CAS heap is a matter of implementation but presumably we can have a design that says any new FSs must be given an xmi id above the high-water mark when serialized back from a service. We already have the requirement that ids must be preserved for the merging of parallel replies. Yes - there are really two definitions of high-water mark floating around in this thread and it would be good to split them apart. (1) the largest xmi:id in the serialized CAS. This is a requirement that the service protocol places on the CAS serializer. This is what we already have for merging, and I don't think Thilo is objecting to this. (2) a dependency on the FS address being an indicator of which FS are newer than others (an FS with a larger address is newer). As I think about it now I am actually unclear on whether we are doing #2 right now at all. Bhavani said we were, but that's not how I recall that the serializer currently works. It keeps a table of all the incoming FS, which is necessary in order to have the xmi:ids going out be the same as the ones coming in. So I thought the serializer just used the fact that an FS was missing from this table to determine that it was new, and *not* a high water mark of the FS address. Bhavani, can you clarify? -Adam