Re: Running CasMultiplier inside a JCasIterable
No, the issue is still open. When I start working on one of the issues that are still recorded on Google Code, I open a corresponding issue on the Apache Jira and add a link to each of them, pointing to each other. I also set the ASFJira flag on the Google Code tracker to true. -- Richard On 05.12.2013, at 02:07, Swirl wrote: > >> Option 2 - let UIMA do the heavy lifting >> >> An alternative and much simple approach might be to create an aggregate which >> does not only contain the engines, but also the reader. Then you don't have >> to >> worry about the reader anymore at all. Just create a UIMA JCasIterator and >> poll CASes from that until it is empty. Some additional info may be found in >> the legacy issue 89 [1]. >> > > Hi Richard, > Is the code in issue 89, implemented in uimafit 2.0.0? > It does not work in uimafit 1.4.0 that I currently have.
Re: Running CasMultiplier inside a JCasIterable
> Option 2 - let UIMA do the heavy lifting > > An alternative and much simple approach might be to create an aggregate which > does not only contain the engines, but also the reader. Then you don't have > to > worry about the reader anymore at all. Just create a UIMA JCasIterator and > poll CASes from that until it is empty. Some additional info may be found in > the legacy issue 89 [1]. > Hi Richard, Is the code in issue 89, implemented in uimafit 2.0.0? It does not work in uimafit 1.4.0 that I currently have.
Re: Running CasMultiplier inside a JCasIterable
Option 1 - by foot: I guess the uimaFIT JCasIterator should continue to read CAS by CAS from the reader. However, for each CAS read by the reader, it should be able to return 0-x CASes. Currently it can only return 1 because it calls engine.process(jCas) on each engine in turn. To return 0-x, I think, it would have create a single aggregate engine from all the engines, call engine.processAndOutputNewCASes(jCas) on that, and handle the UIMA JCasIterator that is returned by it (sorry for two classes having the same name hereā¦). The UIMA JCasIterator would need to become part of the uimaFIT JCasIterator state. Special handling needs to be introduced to make sure the hasNext() method still works, in particular for the case that a CAS produced by the reader does not result in any output CAS. Option 2 - let UIMA do the heavy lifting An alternative and much simple approach might be to create an aggregate which does not only contain the engines, but also the reader. Then you don't have to worry about the reader anymore at all. Just create a UIMA JCasIterator and poll CASes from that until it is empty. Some additional info may be found in the legacy issue 89 [1]. There are probably nasty details, but those should be roughly the general approaches. Cheers, -- Richard [1] https://code.google.com/p/uimafit/issues/detail?id=89 On 04.12.2013, at 01:16, Swirl wrote: > Richard Eckart de Castilho writes: > >> >> For further reference: >> >> https://issues.apache.org/jira/browse/UIMA-3470 > > Thanks for raising the Jira. > > I tried looking at the source codes, but I think I am not able to come up > with > a solution for this. > Do you have any pointers to get me started? > > Thanks.
Re: Running CasMultiplier inside a JCasIterable
Richard Eckart de Castilho writes: > > For further reference: > > https://issues.apache.org/jira/browse/UIMA-3470 > Thanks for raising the Jira. I tried looking at the source codes, but I think I am not able to come up with a solution for this. Do you have any pointers to get me started? Thanks.
Re: Running CasMultiplier inside a JCasIterable
For further reference: https://issues.apache.org/jira/browse/UIMA-3470 -- Richard On 22.11.2013, at 07:37, Richard Eckart de Castilho wrote: > I believe the JCasIterable is currently implemented as a loop which calls > "process" on the analysis engines for every CAS produced by the reader > and then returns the corresponding CAS. This wouldn't work with multipliers. > > Can you please file an issue in the Apache Jira, preferrably with a minimal > test case attached. It shouldn't be a big problem to fix this for the next > release. A patch already fixing this would also work, of course ;) > > Cheers, > > -- Richard > > On 22.11.2013, at 08:01, Swirl wrote: > >> I have successfully used CasMultiplier to spilt up a document into segments >> for further processing using SimplePipeline.runPipeline(). >> I did this by wrapping the CasMultiplier and the succeeding Annotator within >> a >> aggregate. >> >> But by simply changing the usage of SimplePipeline.runPipeline() to using >> JCasIterable. The code no longer runs correctly, i.e., it's returning as CAS >> only the number of physical documents, instead of the segments that i >> expected. >> >> How can I can CasMultiplier to work with a JCasIterable?
Re: Running CasMultiplier inside a JCasIterable
I believe the JCasIterable is currently implemented as a loop which calls "process" on the analysis engines for every CAS produced by the reader and then returns the corresponding CAS. This wouldn't work with multipliers. Can you please file an issue in the Apache Jira, preferrably with a minimal test case attached. It shouldn't be a big problem to fix this for the next release. A patch already fixing this would also work, of course ;) Cheers, -- Richard On 22.11.2013, at 08:01, Swirl wrote: > I have successfully used CasMultiplier to spilt up a document into segments > for further processing using SimplePipeline.runPipeline(). > I did this by wrapping the CasMultiplier and the succeeding Annotator within > a > aggregate. > > But by simply changing the usage of SimplePipeline.runPipeline() to using > JCasIterable. The code no longer runs correctly, i.e., it's returning as CAS > only the number of physical documents, instead of the segments that i > expected. > > How can I can CasMultiplier to work with a JCasIterable?
Running CasMultiplier inside a JCasIterable
I have successfully used CasMultiplier to spilt up a document into segments for further processing using SimplePipeline.runPipeline(). I did this by wrapping the CasMultiplier and the succeeding Annotator within a aggregate. But by simply changing the usage of SimplePipeline.runPipeline() to using JCasIterable. The code no longer runs correctly, i.e., it's returning as CAS only the number of physical documents, instead of the segments that i expected. How can I can CasMultiplier to work with a JCasIterable?