The problem could be a non UTF8 BOM character as the first character in a file. Try opening the XML file in a unicode agnostic editor that allows for different encodings and then re-write it in US ASCII.
https://en.wikipedia.org/wiki/Byte_order_mark Peter On Wed, Dec 18, 2019 at 11:31 AM Finan, Sean < [email protected]> wrote: > Sorry - I missed this: > > I'm using the two CDA files that come with the cTAKES package > (testpatient_cn_2.xml and testpatient_cn_1.xml compatible with > NotesIIST_RTF.DTD > > Those files -should- be ok as they were originally used to test the CDA > workflow. > > The code for CdaCasInitializer and ClinicalNotePreProcessor hasn't changed > since 2015. > > The actual error is coming from the 3rd party xml parser (xerces): > Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; > Content is not allowed in prolog. > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown > Source) > > I am not sure what would be causing this. > > I don't run CDA, so I can't speak to the operational status of those > components or the pipeline in general. > > Does anybody else out there use CDA? > > Sean > > > ________________________________________ > From: Finan, Sean <[email protected]> > Sent: Wednesday, December 18, 2019 2:22 PM > To: [email protected] > Subject: Re: cTAKES handling HL7 CDA Level 1 [EXTERNAL] [SUSPICIOUS] > > * External Email - Caution * > > > Hi Masoud, > > I am not an xml expert, so take this with a grain of salt. > > I think that something is wrong/unmatched with the first line of your xml > document. > Make sure that the first line is something like: > <?xml version="1.0" encoding="utf-8"?> > > Sean > > ________________________________________ > From: Masoud Rouhizadeh <[email protected]> > Sent: Wednesday, December 18, 2019 1:47 PM > To: [email protected] > Subject: Re: cTAKES handling HL7 CDA Level 1 [EXTERNAL] > > * External Email - Caution * > > > Hi all, > > I'm using cTAKES user to process CDA documents by > AggregateCdaProcessor.xml and AggregateCdaUMLSProcessor.xml located in > /desc/ctakes-clinical-pipeline/desc/analysis_engine/ > > My script to call this is > > java -Dctakes.umlsuser= -Dctakes.umlspw= -cp > $CTAKES_HOME/lib/*:$CTAKES_HOME/desc/:$CTAKES_HOME/resources/ > -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g > org.apache.ctakes.core.cpe.CmdLineCpeRunner > $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test_cda_masoud.xml > > test_cda_masoud.xml has a proper path to CDA input and output. I'm using > the two CDA files that come with the cTAKES package (testpatient_cn_2.xml > and testpatient_cn_1.xml compatible with NotesIIST_RTF.DTD). > > Unfortunately, it seems that CdaCasInitializer cannot run, and I get the > attached errors. I get the same errors when using the GUI with > AggregateCdaProcessor AE > > - Am I missing something obvious? > - Does cTAKES *User* installation handle CDA documents? > - Is org.apache.ctakes.core.cpe.CmdLineCpeRunner an appropriate pipeline > for CdaCasInitializer? > > Thank you so much for your help in advance. > > Masoud > > > > > > > > On 11/8/19, 8:30 AM, "Finan, Sean" <[email protected]> > wrote: > > > Hi Masoud, > > I think that the CdaCasInitializer is at least 10 years old. I would > not expect it to conform to any recent standards. > > Does anybody else have a reader or transformer that can handle HL7 CDA > r2? > > Sean > > p.s. > If anybody is involved with HL7 International, you may want to get > some movement on addressing the typo on the page header(s): > > Section 1a: Clinical Document Architcture (CDA®) > > ________________________________________ > From: Masoud Rouhizadeh <[email protected]> > Sent: Thursday, November 7, 2019 5:59 PM > To: [email protected] > Subject: cTAKES handling HL7 CDA Level 1 [EXTERNAL] > > Dear cTAKES developer mailing list, > > We have been working on a project at Hopkins for converting > Epic-generated RTF notes into Clinical Document Architecture Level One. > > We have been using HL7 CDA® Release 2 Schema, and now we plan to use > cTAKES for concept extraction from those documents. The CDA Schema and > examples can be found here > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.hl7.org_implement_standards_product-5Fbrief.cfm-3Fproduct-5Fid-3D7&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=h8q4BiKKL6eDBOGEta7gcpkDGIx5xFPlGrNfUPlzBuc&s=l8HjgDHeywmdkSUkOJBGWNLpJ-bPlw7Lmgzh02w8k2s&e= > > In the cTAKES documentation, I see that CdaCasInitializer "does not > handle all CDA documents. The CDA document must conform to the DTD > resources/cda/NotesIIST_RTF.DTD." > > Has anyone tested and evaluated cTAKES ability to consume HL7 CDA > Level 1 Release 2 documents? > > Thank you, > Masoud > > ---- > Masoud Rouhizadeh, PhD > Faculty - Division of Health Science Informatics (DHSI) > NLP Lead - Institute for Clinical and Translational Research (ICTR) > Johns Hopkins University School of Medicine > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cs.jhu.edu_-7Emrou_&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=h8q4BiKKL6eDBOGEta7gcpkDGIx5xFPlGrNfUPlzBuc&s=8fvrQoIy8orWYKCJoob5Z0Sbbioe5xyiN7pDMTzImOc&e= > > > > >
