Re: small memory footprint tradeoff configuration
Eddie Epstein wrote: Process calls to a Vinci service have always broken FS references. Same for calls thru the compatibility wrapper that allows calling colocated UIMA 1.4x annotators from Apache UIMA. Actually, I think that the compatibility wrapper does preserve FS addresses because it uses binary serialization. But Vinci definitely does not. Eddie
Re: small memory footprint tradeoff configuration
On Tue, Feb 24, 2009 at 2:53 AM, Thilo Goetz twgo...@gmx.de wrote: I have found the discussion again that I was referring to. It wasn't on this list, it was in the OASIS spec discussions. Sorry about the confusion. I don't feel at liberty to publish that conversation here, but maybe Adam would like to comment? He and I were debating this at the time (nearly two years ago). I'm not sure about what OASIS discussion you mean (is it about xmi:id consistency?), but I thought the link that Marshall posted was a reasonable summary of the discussion, including the concerns that I had: http://markmail.org/thread/aolbz4nrvmgjhuyb. The only sticking point I was really concerned about was the invalidation of the FS handle held by an application. But, it was definitely not my intention to shoot down any work in this area (in fact you'll see in that email thread where I explicitly said I'm in favor of doing something in this space). I just want to discuss it and see if we can come to a mutually acceptable plan. To address Eddie's point about Vinci services breaking FS handles already - I consider that a bug, so am not happy using that as a rationale to invalidate FS handles as a general policy. And I'm worried that users who haven't been using Vinci services (I bet we have plenty of those) have built applications that rely on this behavior. I remember suggesting that we post on the user list about this, but am not sure if we ever did. If you do a GC approach, is there not any way to include application-created FeatureStructures as part of the root set? Or to look at it another way, the set of FS's that you do the GC over is only those created since the CAS was input to the current AE (possible aggregate). It seems like Marshall's angle (if I understood it) is not really GC at all, but a model where an annotator decides to explicitly delete FS. I could be okay with that idea, too. A GC model by definition should preserve any referenced FSs, but if we say we have an explicit deletion model where anybody can delete anyone else's stuff, at least we won't confuse people about what's going on. Current applications that use existing annotators would not break (because the annotators would not delete anything), and if a new annotator is introduced that breaks the application, it's the annotator's fault for being too aggressive in deleting stuff that someone else might still need. -Adam
[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS
[ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676308#action_12676308 ] Burn Lewis commented on UIMA-1245: -- To summarize my understanding of recent discussions ... First I'd like to suggest that the default should not change. Processing the parent last does not guarantee that UIMA-AS will act like core UIMA ... in addition the size of all downstream pools must be set to 1 to ensure that each child is processed sequentially. We should document the settings needed for UIMA-like processing but I think the default should be UIMA-AS style processing, i.e. processParentLast=false. With the current design parents are held in the final step of an aggregate until all children have completed processing in that aggregate. This ensures that any child errors can be reported on the input CAS, and that aggregate CMs satisfy the CM contract of not processing the parent until all children have been returned. If this aggregate is nested in another, the same conditions hold at the final step of the outer aggregate. But with this new processParentLast=true option the parent must be held after the CM until all of its children have completed processing in all aggregates, i.e. have been returned to their pool. Unlike the previous case we must track the number of children active in any of the nested aggregates. Processing order of parent CAS different on UIMA and UIMA AS Key: UIMA-1245 URL: https://issues.apache.org/jira/browse/UIMA-1245 Project: UIMA Issue Type: Bug Components: Async Scaleout Reporter: Eddie Epstein Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate. A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing. However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (UIMA-1294) Enable access of service's ipaddr from process Cas replies
Enable access of service's ipaddr from process Cas replies -- Key: UIMA-1294 URL: https://issues.apache.org/jira/browse/UIMA-1294 Project: UIMA Issue Type: Improvement Components: Async Scaleout Reporter: Eddie Epstein Assignee: Eddie Epstein Priority: Minor Process Cas reply messages contain the service's host ipaddr, but there is no mechanism to retrieve this info. Also, would be nice for the sample program, RunRemoteAsyncAE to show how to access this info and to display it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (UIMA-1293) Replies from remote CasMultipliers that don't always generate CASes are not handled correctly
[ https://issues.apache.org/jira/browse/UIMA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676422#action_12676422 ] Jerry Cwiklik commented on UIMA-1293: - A remote (primitive) cas multiplier's process() method returns an input CAS back to the client only if ALL CASes produced by the CM have been released. This is wrong. The correct logic is to release the input CAS when all *its* children have been released. Modified the code to use a child count associated with the input CAS when making a decision whether or not to send the input CAS back to the client. Replies from remote CasMultipliers that don't always generate CASes are not handled correctly - Key: UIMA-1293 URL: https://issues.apache.org/jira/browse/UIMA-1293 Project: UIMA Issue Type: Bug Components: Async Scaleout Reporter: Burn Lewis Fix For: 2.3 Attachments: UIMA1293.patch If a remote CM generates 1 CAS for every N input, some of the childless parents do not continue in the flow. Since the default FC uses dropIfNewCasProduced all CASes should continue in the flow except for every N-th one being replaced by its child. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (UIMA-1194) JMX stats for UIMA AS seem inconsistent
[ https://issues.apache.org/jira/browse/UIMA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Cwiklik closed UIMA-1194. --- Resolution: Fixed JMX stats for UIMA AS seem inconsistent Key: UIMA-1194 URL: https://issues.apache.org/jira/browse/UIMA-1194 Project: UIMA Issue Type: Bug Components: Async Scaleout Reporter: Jerry Cwiklik The aggregate's JMX stats for remote delegate seem different from those shown by the delegate's JMX stats. Specifically, analysis times are different. These numbers should be the same in both. it appears that the numbers shown in the delegate's stats are always larger. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.