On 23.10.2013 16:34, Marshall Schor wrote: > On 10/23/2013 8:36 AM, Peter Klügl wrote: >> Is it correct that the type system may not change if the analysis engine >> implementation extends JCasAnnotator_ImplBase? I somehow miss the method >> typeSystemInit(). Hmm, should I really switch to CasAnnotator_ImplBase, >> or do I have missed something? > I think the type system is "equal" for these 2 CASes, but not "==", since the > "failing" case recreates a new CAS from the identical metadata.
The type were not equal enough for a HashMap :-) > UIMA is designed with the lifecycle: 1) assemble / configure pipeline, > including merging type systems; 2) use the internal Java objects that were > created in (1) to process multiple work-items, typically by reusing CASes (via > the reset()) or by getting new CASes from the AnalysisEngine representing the > top level of the pipeline using "analysisEngine.newJCas()" or > analysisEngine.newCas(). This produces new CASes where the type system impl > objects are == (identical). > > Approaches which produce type system objects which are equal but not == should > be discouraged. > > You could probably easily detect when a user passes a CAS where the type > system > is not ==, and redo your internal setups... Stupid me. Yes, I did that now. Thanks :-) Peter > -Marshall > > >> Peter >> >> On 23.10.2013 14:35, Peter Klügl (JIRA) wrote: >>> [ >>> https://issues.apache.org/jira/browse/UIMA-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802850#comment-13802850 >>> ] >>> >>> Peter Klügl commented on UIMA-3357: >>> ----------------------------------- >>> >>> Thanks for reporting this. I added a test for now. >>> >>> The problem is that the type system has changed, at least its >>> representation in java, but nobody told the analysis engine about it. On >>> the one hand, the environment of the script stores the known types. This is >>> initiated by {{initializeTypes()}} either if the analysis engine was not >>> initialized yet or if the analysis engine is forced to update itself with >>> each process call (parameter reloadScript). On the other hand, the internal >>> "indexing" (begin and end map in RutaBasic) uses the current CAS, its >>> annotations and their types. So we have different type objects that cause >>> problems. >>> >>> >>>> CONTAINS fails when running script as AE in a pipeline with a new CAS >>>> --------------------------------------------------------------------- >>>> >>>> Key: UIMA-3357 >>>> URL: https://issues.apache.org/jira/browse/UIMA-3357 >>>> Project: UIMA >>>> Issue Type: Bug >>>> Components: ruta, uimaFIT >>>> Affects Versions: 2.0.1ruta, 2.1.0ruta >>>> Reporter: Daniel Maeurer >>>> Assignee: Peter Klügl >>>> Priority: Minor >>>> >>>> When running my Ruta script as an analysis engine in a pipeline, it does >>>> not work correctly when creating a new CAS and processing the pipeline a >>>> second time with the new CAS. >>>> While reusing the old cas with "cas.reset()" is working, creating a new >>>> CAS results in failing rules including "CONTAINS" in the ruta script. >>>> The ruta script used in the example: >>>> {code:title=mystic.ruta|borderStyle=solid} >>>> PACKAGE de.tudarmstadt.algo.vpino.ruta; >>>> DECLARE test; >>>> Document{CONTAINS(CW)->MARK(test)}; >>>> {code} >>>> The following Java class can reproduce the error. It creates four xmi >>>> files. The last xmi file is missing the annotations created with rules >>>> including "CONTAINS". >>>> {code:title=MysticPipe.java|borderStyle=solid} >>>> package org.uimafit.pipeline; >>>> import java.io.File; >>>> import java.io.FileOutputStream; >>>> import java.io.IOException; >>>> import java.io.OutputStream; >>>> import java.util.ArrayList; >>>> import java.util.List; >>>> import org.apache.uima.UIMAFramework; >>>> import org.apache.uima.analysis_engine.AnalysisEngine; >>>> import org.apache.uima.analysis_engine.AnalysisEngineDescription; >>>> import org.apache.uima.analysis_engine.AnalysisEngineProcessException; >>>> import org.apache.uima.cas.CAS; >>>> import org.apache.uima.cas.impl.XmiCasSerializer; >>>> import org.apache.uima.fit.factory.AnalysisEngineFactory; >>>> import org.apache.uima.fit.pipeline.SimplePipeline; >>>> import org.apache.uima.resource.ResourceInitializationException; >>>> import org.apache.uima.resource.metadata.ResourceMetaData; >>>> import org.apache.uima.util.CasCreationUtils; >>>> import org.apache.uima.util.InvalidXMLException; >>>> import org.apache.uima.util.XMLInputSource; >>>> import org.apache.uima.util.XMLSerializer; >>>> import org.xml.sax.SAXException; >>>> public class MysticPipe { >>>> public static void main(String[] args) throws Exception { >>>> working("This is a test.", initPipeline()); >>>> failing("This is a test.", initPipeline()); >>>> } >>>> private static AnalysisEngine initPipeline() throws >>>> ResourceInitializationException, IOException, InvalidXMLException { >>>> File specFile = new >>>> File("./descriptor/de/tudarmstadt/algo/vpino/ruta/mysticEngine.xml"); >>>> XMLInputSource in = new XMLInputSource(specFile); >>>> AnalysisEngineDescription ruta = (AnalysisEngineDescription) >>>> UIMAFramework.getXMLParser().parseResourceSpecifier(in); >>>> return AnalysisEngineFactory.createEngine(ruta); >>>> } >>>> private static void working(String input, AnalysisEngine theEngine) >>>> throws ResourceInitializationException, AnalysisEngineProcessException, >>>> IOException, >>>> SAXException { >>>> final List<ResourceMetaData> metaData = new >>>> ArrayList<ResourceMetaData>(); >>>> metaData.add(theEngine.getMetaData()); >>>> final CAS cas = CasCreationUtils.createCas(metaData); >>>> System.out.println("create a new cas..."); >>>> cas.setDocumentLanguage("de"); >>>> cas.setDocumentText(input); >>>> SimplePipeline.runPipeline(cas, theEngine); >>>> writeXmiFile(cas, "works_test1");//CHECK >>>> //THE DIFFERENCE >>>> cas.reset(); >>>> //END DIFFERENCE >>>> System.out.println("create a new cas..."); >>>> cas.setDocumentLanguage("de"); >>>> cas.setDocumentText(input); >>>> SimplePipeline.runPipeline(cas, theEngine); >>>> writeXmiFile(cas, "works_test2");//CHECK >>>> } >>>> private static void failing(String input, AnalysisEngine theEngine) >>>> throws ResourceInitializationException, AnalysisEngineProcessException, >>>> IOException, >>>> SAXException { >>>> final List<ResourceMetaData> metaData = new >>>> ArrayList<ResourceMetaData>(); >>>> metaData.add(theEngine.getMetaData()); >>>> final CAS cas = CasCreationUtils.createCas(metaData); >>>> System.out.println("create a new cas..."); >>>> cas.setDocumentLanguage("de"); >>>> cas.setDocumentText(input); >>>> SimplePipeline.runPipeline(cas, theEngine); >>>> writeXmiFile(cas, "works_test3"); // CHECK >>>> //THE DIFFERENCE >>>> final CAS cas2 = CasCreationUtils.createCas(metaData); >>>> //END DIFFERENCE >>>> System.out.println("create a new cas..."); >>>> cas2.setDocumentLanguage("de"); >>>> cas2.setDocumentText(input); >>>> SimplePipeline.runPipeline(cas2, theEngine); >>>> writeXmiFile(cas2, "fail_test4"); //FAIL >>>> return; >>>> } >>>> >>>> public static void writeXmiFile(CAS aCas, String Fname) throws >>>> IOException, SAXException { >>>> File outFile = new File("output", Fname + ".xmi"); >>>> OutputStream out = null; >>>> try { >>>> // out = new StringOutputStream(); >>>> out = new FileOutputStream(outFile); >>>> XmiCasSerializer ser = new >>>> XmiCasSerializer(aCas.getTypeSystem()); >>>> XMLSerializer xmlSer = new XMLSerializer(out, false); >>>> ser.serialize(aCas, xmlSer.getContentHandler()); >>>> } finally { >>>> if (out != null) { >>>> out.close(); >>>> } >>>> } >>>> } >>>> } >>>> {code} >>> -- >>> This message was sent by Atlassian JIRA >>> (v6.1#6144) >>>